STUDIES ON THE MIGRATION AND ISOLATION OF SELECTED 
ANCIENT POPULATIONS OF INDIA-A NON RECOMBINANT 


Y CHROMOSOME (NRY) STUDY 


Thesis submitted to 
Madurai Kamaraj University, India 
(University with Potential for Excellence) 
for the degree of 


Doctor of Philosophy in Genomics 


December 2012 


By 
A.Syama 


Regn. No. 1892 


Department of Immunology, 
School of Biological Sciences 
Madurai Kamaraj University, 


Madurai — 625 021, India 


CERTIFICATE 


Certified that the thesis “Studies on the migration and isolation of selected ancient 
populations of India- a Non recombinant Y chromosome (NRY) study” submitted by 
A.Syama, Department of Immunology, School of Biological Sciences, Madurai Kamaraj 
University, is a record of research work carried out by her for the degree of Doctor of 


Philosophy, under my guidance. 


The thesis is the original work of the candidate and to the best of my knowledge, it has not 
been submitted, in part or in full, for any diploma, degree, associateship, fellowship or 
other similar titles in this or any other University. No portion of the thesis is a reproduction 


from any other source, published or unpublished, without acknowledgement. 


aa 
as Qsfufi2— 


Prof. R.M.Pitchappan (Guide) Prof. G. Marimuthu (Co-Guide) 


Prof. Rivi. Pitchappai 
Regional Director — 

The Genographic Preject- India _ 
Chettinad Academy of Research and Educatio: 
Rajiv Gandhi Salai (OMR, Chennai} 

Yelambakkam, Kancheeouram Niet-F%" 


DECLARATION 


I declare that the thesis “Studies on migration and isolation of ancient 
populations of India- a Non recombinant Y chromosome (NRY) study” is the 
result of the study originally carried out by me under the guidance and supervision of 
Prof RM Pitchappan, Professor of Immunology (Rtd), School of Biological 
Sciences, Madurai Kamaraj University (University with Potential for Excellence), 


Madurai 


This work has not been submitted earlier in full of in part, for any diploma or 
degree in this or any other university. I also declare that no part(s) of the thesis is/are a 
reproduction from any other source, published or unpublished without 


acknowledgement. 


Station: Madurai 


Date A.Syama 


Acknowledgements 


Foremost, I would like to express my sincere gratitude to my guide 
Prof. RM. Pitchappan for the continuous support of my Ph.D study and research, for 
his patience, motivation, enthusiasm, and immense knowledge. His guidance helped 
me in all the time of research and writing of this thesis. I could not have imagined 


having a better advisor and mentor for my Ph.D study. 


I thank Dr.Spencer Wells and Genographic Project funded by National 
Genographic Society, IBM and the Waitt Family foundation financial support in the 
form of fellowship from September 2006 to December 2012 and also technologies, 
equipment and consumables during this period. I am thankful to the Genographic 


team members for all their support and ideas in work and data analysis. 


I am grateful to the Professors/collaborators for their help, support and 
guidance during sample collection. I would seize this opportunity to thank Dr. Veeraju 
(Andhra University, Vizag), Dr. Velaga Lakshmi (Andhra University, Vizag), 
Dr. Sambasiva Rao (Andhra University, Vizag) who assisted sampling in Andhra 
Pradesh during July 2007. I am also thankful to Air Marshal Chengappa and Mr Manu 
for their help and support in sampling populations from Coorg district. My sincere 
thanks to Dr. Gangadhar (Manasa Gangotri) for his guidance and support in sampling 
Mysore and Mandya districts. I express my gratitude to Dr Bhat (Manasa Gangotri) 
and Mr Arun Kumar Suvarna for their guidance on sampling logistics on populations 
from Mangalore and Udupi. I hearty thanks to Dr Agarkar (Homibaba Center for 
Science Education), Dr V.G Gambhir (Sholapur Science Center), Late Dr. Uma Rao 
N (IWSA, Mumbai), Dr Kamalrookh Marolia ( K.J Somaiyya College, Mumbai), 
Dr. Vidyanand Khandagale (Shivaji University, Mumbai) and Dr Mumtaz Baig 
(Amaravati University, Amaravati) for their help in identifying the populations and 
help during Maharashtra sampling. I thank Dr Krishna (Baroda University), 
Mr Mahendra Gadvi (Kutch), Mr Ashok Vyas and Mr Ashok Mehta from Agakhan, 
NGO at Ahmedabad for their support and assistance to Genographic team in sampling 


Gujarat populations. 


I am thankful to Ms Jhansi Rani (Andhra Pradesh), Dr Mahadeva (Mysore and 
Mandya of Karnataka), Mr Shibu Chakravarthy (Mangalore, Udupi of Karnataka), Ms 
Pallavi Kadam (Mumbai, Maharashtra), Mr Gajendrasingh Pachlore (Amaravati, 


Mahrashtra) for their assistance during field work in the respective regions. 


My sincere thanks to Prof G Marimuthu for all his support throughout the 
study period. I also thank Dr. Jeyalakshmi for all the help and support during my 
study period. I am grateful to Prof P. Gunasekaran, Prof Sripati Kandula, Prof G. 
Kulundaivelu, Dr K. Manoharan, Dr Hussain Munavar, Dr P.S. Sudhakarswamy, Prof 
G.S. Selvam, Prof G. Shanmugam, Dr Kumaresan and Dr Balakrishnan for all their 
help and support. 


I special thank my friends and colleagues: Mr V.S Arun and G Arun Kumar 
for all their support at all time both professionally and personally. I thank Mr V.S 
Arun for being a moral boost during my stay in Madurai Kamaraj University. I am 
thankful to Mr G. Arun Kumar for all his suggestions while proof reading the thesis. I 
also thank Ms Meera B for all her help in proof reading this thesis. 


I would like to acknowledge my senior Dr. V.J Kavitha for all her teaching 
and laboratory guidance. I would like to thanks Mr Lieshman John Thamburaj and Mr 
Kalyana Raman for all their support and suggestion during the Ph.D term. I am 
thankful to Mrs Mathuram for all her administrative related help. I acknowledge all 
the help extended by Mr Krishnamoorthy, Mr Malai Raja and Mr T.Sundaramoorthy 
in the laboratory. 


I am thankful to Prof Kailash Paliwal and Prof Muthu Subrahmanian for their 


encouragement and help in making me pursue my Ph.D. 


I am grateful to all my friends of Madurai Kamaraj University, for making my 
stay in the university and hostel pleasant. I wish thanks to my friends Ms Meera B, 
Ms Lanvanya Pushpam, Ms Preeti, Dr. Nagarani Mahajan, Mr Bharat Bhusan, Mr 
Ram Kumar, Mr Karthik and Mr Kannan for all the help they gave me and welcoming 


me as a friend. I am very grateful to their families also for their affection toward me. 


I would not have contemplated this road if not for my parents, Mr A.S.N 
Murthy and Mrs A. Vidyavathy who instilled within me a love of creative pursuits, 
science and language, all of which finds a place in this thesis. To my parents, thank 
you. I am indepted to my uncle Dr G.V Subrahmaniyam and aunt Mrs G Kalyani for 
all their encouraging words and the trust they showed in me in pursuing my Ph.D. I 
am also thankful to my uncle Mr A.S.R Kumar and Mrs A Gowri Kamerswari for all 
their encouraging words and love they always bestow on me. I also thank my family 
members Mr Krishna Subramaniyam and Dr Lakshmi Subrahmaniyam for their 
support, comments and suggestions in all aspects of Ph.D. I am thankful to my 
cousins Ms G. Swetha Kameswari, Mr A. Ananth, Ms Shanti Subramaniyama, Mr 
Aditya Subramaniyan and my grandparents : Mr M Krishna, Mrs M Kameswari and 


Mrs A Syamala for being stress busters at all needy times. 


Table of Contents 


. Introduction 


. Review of literature 


2.1 Land and its people 
2.1.1 Archaeological time scale of India 
2.1.1.1 Late Pleistocene and first cultures of India 
2.1.1.2 Middle and Upper Palaeolithic ages in Deccan India 
2.1.1.3 Mesolithic and technology advancement 
2.1.1.4 Neolithic and the Indus Valley civilization 
2.1.1.5 Neolithic farming and traditions in North 
and Northeast India 
2.1.1.6 Neolithic and farming in Deccan 
2.1.1.7 Chalcolithic ages in Deccan 
2.2 Origin/Spread of people and languages in India- evidences from 
Linguistic studies 
2.2.1. Indo-European languages 
2.2.2 Dravidian languages 
2.2.3 Austro-Asiatic languages 
2.2.4 Tibeto-Burman languages 
2.3 Castes and tribes of India 
2.4 NRY chromosome as a tool to study population history 
2.4.1 Structure of Y chromosome 
2.4.2 Mutations of interest on Y chromosome 
2.4.3 Mutation rates 
2.5 Nomenclature system for YSNPs and YSTRs 
2.6 Evolution of NRY HG markers and its global distribution 
2.6.1 NRY HGs of African origin 
2.6.2 Phylogeography in South Asia 


2.6.3 Major HGs of Middle East and Caucasus 


oOo Oo Oo wo 


12 
12 


2.7 Previous studies from this laboratory 


. Materials and Methods 


3.1 Sampling 
3.2 Topography of sampled locations 
3.2.1 Gujarat 
3.2.2 Maharashtra 
3.2.3 Karnataka 
3.2.4 Andhra Pradesh 
3.3 Sample collection and DNA extraction 
3.4 Genotyping 
3.4.1 DNA dilution 
3.4.2 DNA quantification 
3.4.3. NRY-SNP genotyping 
3.44 YSTR genotyping 
3.5 Quality Control 


3.6 Statistical analysis 


. Results 


4.1 The logic of studying populations from Deccan and Gujarat 


4.1.1. Gujarat-The coastal gateway to India 


4.1.1.1 NRY Haplogroup frequency distribution in 
Gujarat study populations 

4.1.1.2 Neighbour joining tree 

4.1.1.3 Analysis of Molecular Variance (AMOVA) 

4.1.1.4 Principal Component Analysis (PCA) 

4.1.1.5 Multidimensional Scaling (MDS) 

4.1.1.6 Phylogenetic network analysis 

4.1.1.7 Mismatch distributions 

4.1.1.8 BATWING analysis 


4.1.2 Maharashtra- interface of north and south 


4.1.2.1 NRY- Haplogroup frequency distribution in Maharashtra 55 


4.1.2.2 Neighbour Joining tree 

4.1.2.3 AMOVA 

4.1.2.4 Principal component analysis 

4.1.2.5 Multidimensional scaling 

4.1.2.6 Phylogenetic network 

4.1.2.7 Mismatch distribution 

4.1.2.8 BATWING analysis and ASD analysis 
4.1.3 Karmataka- The land of mid-Western Ghats 

4.1.3.1 Y haplogroup frequency distribution 

4.1.3.2 Neighbour Joining tree 

4.1.3.3 AMOVA 

4.1.3.4 Principal Component Analysis and MDS 

4.1.3.5 Phylogenetic networks 

4.1.3.6 Mismatch distributions 


4.1.3.7 BATWING age estimates 


4.1.4 Godavari Delta of Costal Andhra Pradesh — an ancient 


fertile river-fed settlement of Deccan 


4.1.4.1 Y haplogroup frequency 

4.1.4.2 Neighbour joining tree 

4.1.4.3 AMOVA 

4.1.4.4 Principal Component Analysis 

4.1.4.5 Multidimensional scaling 

4.1.4.6 Phylogenetic networks 

4.1.4.7 Mismatch distributions 

4.1.4.8 BATWING and ASD analysis 

4.2 NRY HG L1-M27/76 story and India 

4.2.1 Distribution of NRY HG L1-M27/76 
4.2.2 L1-M27/76 17 YSTR variance distribution 


56 
56 
56 
57 
Df 
59 
39 
61 
61 
62 
62 
63 
63 
66 
66 
68 


68 
69 
69 
70 
71 
71 
73 
73 
75 
75 
76 


4.2.3 L1-M27/76 haplotype network analysis EL 
4.2.4 Multidimensional scaling 79 
4.2.5 STR based age estimates 79 


4.2.6 BATWING estimates of population parameters 


and phylogenetic tree 80 
4.3 Genetic footprints of HG L3*-M357- An enigma 82 
4.3.1 Distribution of NRY HG L3*-M357 82 


4.3.2 Comparison of HG L3*-M357 with the global populations 83 


4.4 Origin and dispersal of NRY HG J2a*-M410 86 
4.4.1Phylogeography of HG J2a*-M410 86 
4.4.2 Phylogenetic network analysis 86 
4.4.3 Comparison with global populations 87 
4.4.4 ASD and BATWING based age estimate 88 

4.5 Studied on NRY HG J2b-M221/102 90 

4.5.1 Phylogeography of HG J2b-M221/102 in India 90 
4.5.2 Phylogenetic network analysis 90 


4.5.3 Comparison of HG J2b-M221/102 with global populations 91 
4.5.4 ASD and BATWING based age estimate 91 
4.6 Nattukottai Chettiars- A case study on caste formation 93 
4.6.1 NRY HG frequency distribution in various sub-clans of NC 93 
4.6.2 Neighbour Joining tree computations 93 


4.6.3 Phylogenetic Network and mismatch distribution analysis 94 


4.6.4 K mean clustering analysis 95 

4.6.5 BATWING analysis 95 

. Discussion 97 
5.1 The parameters of isolation in different regions differ 97 


5.2 Various tribes and castes of a given region may have different origins 106 
5.2.1 Not all the Brahmins had common ancestry 111 


5.3 The distribution of L1 and the story of Dravidian 113 


5.3.1 The Dravidian 

5.3.2 NRY HGLI1 chromosomal evidences 
5.4 The HG L3*-M357: Brokpa and their Aryan claim 
5.5 The story of NRY HG Js as a marker for agricultural expansion 
5.6 The story of Nattukottai Chettiar and fidelity of patrilinly 


. Conclusions 


. References 


. Appendix 


113 
116 
119 
123 
129 
Laz 
136 


Figure- 1 
Figure 2 


Figure- 3 


Figure-4a-4g 


Figure-5 


Figure-6 


Figure-7 


Figure-8 


Figure- 9 


Figure -10a 


Figure -10b 


Figure- lla 


Figure 11b 
Figure 12 


Figure 13a-13c 


Figure 14a-14h 


Figure 15 
Figure 16 


List of Figures 


Topographic map of India 

Schematic representation of Y chromosome showing the position of 
YSTR markers employed in this study 

Phylogenetic tree of Y chromosomal haplogroups: Adapted from 
Karafet et al., 2008 

Global frequency distribution map of NRY HG C, F, H, L, O, R, J 
and its subtypes 

Map of India showing the sample locations of various study 
populations 

Absolute Quantification: PCR amplification plot 

Taqman assay: Allelic Discrimination plot 

YSTR assay: Data output as seen in Gene Mapper v3.1 software 

Pie charts depicting NRY HG frequencies in the populations of 
Gujarat 

NJ tree based on NRY HG- Fst distances for Gujarat study 
populations 

NJ tree based on NRY HG- Rst distances for Gujarat study 
populations 

Principal Component Analysis of NRY HG frequencies of study 
populations from Gujarat 

Scree plot for PCA (Figure 11a) components 

Multi Dimensional Scaling of NRY-STR -Rst distances for 
populations of Gujarat 

Reduced median phylogenetic network analysis of Gujarat study 
populations: HGs Hla*, H2, Rlala 

Mismatch distributions of NRY HG CR, CS, Hla*, H2, J2a*, Qla3, 
Rlala, R2 

Batwing phlogenetic tree for populations of Gujarat 

Pie charts depicting NRY HG frequencies in the populations of 
Maharashtra 


Figure 17a 


Figure 17b 


Figure 18a 


Figure 18b 
Figure 19 


Figure 20a-20¢e 


Figure 21a-21h 


Figure 22 
Figure 23 


Figure 24a 


Figure 24b 


Figure 25a 


Figure 25b 
Figure 26 


Figure 27a-27f 


Figure 28a-28k 


Figure 29 
Figure 30 


NJ tree based on NRY HG- Fst distances for Maharashtra study 
populations 

NJ tree based on YSTR- Rst distances for Maharashtra study 
populations 

Principal Component Analysis of NRY HG frequencies of study 
populations from Maharashtra 

Scree plot for PCA (Figure 18b) components 

Multi Dimensional Scaling of NRY-STR -Rst distances for 
populations of Maharashtra 

Phylogenetic network analysis of Maharashtra study populations: 
HGs Hla*, J2a*, L1, O2a, Rlala 

Mismatch distributions of NRY HGs C5, Hla*, H2, J2a*, L1, O2a, 
Rlala, R2 

Batwing phylogenetic tree for populations of Maharashtra 

Pie charts depicting NRY HG frequencies in the populations of 
Karnataka 

NJ tree based on NRY HG- Fst distances for Karnataka study 
populations 

NJ tree based on NRY STR- Rst distances for Karnataka study 
populations 

Principal Component Analysis of NRY HG frequencies of study 
populations from Karnataka 

Scree plot for PCA (Figure 25a) components 

Multi Dimensional Scaling of NRY-STR -Rst distances for 
populations of Karnataka 

Phylogenetic network analysis of Karnataka study populations: HGs 
F*, Hla*, L1, Rlala, R2 

Mismatch distributions of NRY HGs F*, C5, H*, J2a*, 2b, L1, L3*, 
Rlala, R2 from Karnataka study populations 

BATWING phylogenetic tree for populations of Karnataka 

Pie charts depicting NRY HG frequencies in the populations of 
Andhra Pradesh 


Figure 31a 


Figure 31b 


Figure 32a 


Figure 32b 


Figure 33 


Figure 34a-34e 


Figure 35a-35j 


Figure 36 
Figure 37a 
Figure 37b 
Figure 38 


Figure 39 


Figure 40 


Figure 41 


Figure 42 


Figure 43 


NJ tree based on NRY HG- Fst distances for Andhra Pradesh study 
populations 

NJ tree based on NRY STR- Rst distances for Andhra Pradesh study 
populations 

Principal Component Analysis based on YHG frequencies of study 
populations from Andhra Pradesh 

Scree plot for PCA (Figure 32a) components 

Multi Dimensional Scaling of NRY-STR-Rst distances for 
populations of Andhra Pradesh 

Phylogenetic networks for HGs F*-M89, Hla*-M82, L1-M27/76, 
O2a-M95, Rlala-M17 

Mismatch distributions of NRY HG F*- M89, HG CS5-M356, HG 
H*-M89, HG Hla*-M82, HG H2-Apt, HG J2a-M410,HG J2b- 
M221/102, HG L1-M27/76,HG O2a-M95,HG Rlala-M17, HG R2- 
M124 

Batwing phylogenetic tree for populations of Andhra Pradesh 
Contour map of NRY HG L1-M27/76 based on its frequency 
Contour map of NRY HG L1-M27/76 based on YSTR variance 
Reduced median phylogenetic network for all (N-611) HG LI- 
M27/76 pan Indian populations (17 YSTRs) 

Reduced median phylogenetic network for all (N-404) HG LI1- 
M27/76 pan Indian populations (Tamil Nadu truncated) with 17 
YSTRs 

Reduced median phylogenetic HG L1 network for all Indian and 
literature data (9 YSTRs) 

Reduced median phylogenetic network for all (N-404) HG LI- 
M27/76 pan Indian populations (Tamil Nadu truncated) with 8 
YSTRs 

Multidimensional scaling plot based on HG L1 Rst distances for all 
Indian and literature data (9 YSTRs) 

NJ tree based on HG LI Rst distances for all Indian and literature 
data (9 YSTRs) 


Figure 44a 


Figure 44b 
Figure 45 


Figure 46 


Figure 47 


Figure 48 
Figure 49a 


Figure 49b 
Figure 50a-50b 


Figure 51a-51b 


Figure 52 


Figure 53 


Figure 54 


Figure 55a 


Figure 55b 
Figure 56 


Figure 57 


Figure 58 


Contour map showing the distribution of HG L3* in India based on 
its frequency 

Contour map based on YSTR variance of HG L3* in India 

Reduced median phylogenetic network for HG L3* among Indian 
populations (17 YSTRs) 

Reduced median phylogenetic network for HG L3*-M357 with 
global populations (9 YSTRs) 

YSTR Rst based MDS plot for HG L3* for Indian and global 
populations 

BATWING based phylogenetic tree for HG L3* global data 

Contour map showing the distribution of HG J2a* in India based on 
its frequency 

Contour map based on YSTR variance of HG J2a* in India 

Reduced median phylogenetic network for HG J2a*-M410 for 
Indian populations (17 YSTRs) 

Reduced median phylogenetic network for HG J2a*-M410 based on 
8 YSTRs 

Rst based MDS plot for HG J2a* for Indian along with global 
populations with 8 YSTRs 

NJ tree based on YSTR HG J2a* Rst distances with global 
populations with 8 YSTRs 

Mismatch distribution based on 8 YSTR loci of HG J2a* in all study 
states 

Contour map showing the distribution of HG J2b in India based on 
its frequency 

Contour map based on YSTR variance of HG J2b in India 

Reduced median phylogenetic network for HG J2b for Indian study 
populations (17 YSTRs) 

Reduced median phylogenetic network for HG J2b for Indian and 
global populations (8 YSTRs) 

Rst based MDS plot for HG J2b for Indian along with global 
populations for 8 YSTRs 


Figure 59 


Figure 60 


Figure 61 


Figure 62a 


Figure 62b 


Figure 63 
Figure 64 


Figure 65 
Figure 66 


Figure 67 
Figure 68 


Figure 69 
Figure 70 


NJ tree based on HG J2b YSTR-Rst distances with Indian and 
global populations for 8 YSTRs 


Mismatch distribution based on 8 YSTR loci of HG J2b-M221/102 
in all study states 


Map of Chettinad area in Tamil Nadu showing locations of various 


kovils 


NJ tree based on NRY HG Fst distances for subclans of Nattukottai 
Chettiar 


NJ tree based on NRY STR Rst distances for subclans of Nattukottai 
Chettiar 


Mismatch distributions of NRY HGs among Nattukottai Chettiar 


Reduced median phylogenetic network of NRY HG L1-M27/76 
among Nattukottai Chettiar 

K mean clustering at K- 2,3,4,5,6,7,8,9 

BATWING based phylogenetic tree for sub-clans of Nattukottai 
Chettiar 

PCA plot based on NRY HG frequencies of all the study states 


Map of the Dravidain and Munda languages (From Trautman, 
1981:10) 


Y chromosome phylogenetic tree of HG L with its defining SNP 


Y chromosome phylogenetic tree of HG J with its defining SNP 


Table la-1g: 
Table 2: 


Table 3: 


Table 4a-4d: 


Table 5: 


Table 6: 


Table 7: 


Table 8: 


Table 9: 


Table 10: 


Table 11: 


Table 12: 


Table 13: 


Table 14: 


Table 15: 


Table 16a: 


List of Tables: 


Global NRY HG frequencies of C, F, H, L, O, R, J and its subtypes 
Demographic table for various study populations 


NRY HG frequency table for the study populations from Gujarat, 
Maharashtra, Karnataka and Andhra Pradesh 


Fisher exact test table for the study populations from Gujarat, 
Maharashtra, Karnataka and Andhra Pradesh 


NRY based AMOVA of various sub groupings of Gujarat study 
population 
Pairwise distance matrix based on NRY HG Fst and YSTR Rst 


distances for Gujarat study populations 


BATWING based ancestral effective population sizes for all the study 


states 


BATWING based Ancestral Effective population size, TMRCA and 


population expansion times for all the study states 

BATWING based age estimate for YSNPs in Gujarat study population 
NRY based AMOVA of various sub groupings of Maharashtra study 
population 

Pairwise distance matrix based on NRY HG Fst and YSTR Rst 


distances for Maharashtra study populations 


NRY based AMOVA of various sub grouping of Karnataka study 


populations 


Pairwise distance matrix based on NRY HG Fst and YSTR Rst 
distances for Maharashtra study populations 


NRY based AMOVA of various sub grouping of Andhra Pradesh study 


populations 


Pairwise distance matrix based on NRY HG Fst and YSTR Rst 
distances for Andhra Pradesh study populations 


Frequency distribution of NRY HG L1-M27/76 in India 


Table 16b: 


Table 17: 


Table 18a: 


Table 18b: 


Table 19: 


Table 20: 


Table 21: 


Table 22: 


Table 23: 


Table 24: 


Table 25a: 


Table 25b: 


Table 26: 


Table 27: 


Table 28: 


Table 29: 


Table 30: 


Frequency of Tamil Nadu populations before truncating 


Frequencies of HG L1-M27/76 in various language speakers of India 
showing higher representation in South Dravidian and Indo-European 


language speakers 


SSD distances of HG LI computed from central median haplotype 


among various study states 


SSD distances of HG L1 computed from central median haplotype 


among various language families 


Composition of central median of HG L1 network with various YSTR 


numbers considered 


HG L1-YSTR variance and ASD based age estimates from Indian 
study regions and literature data 


Effective population sizes and population expansion times of various 


regions based on BATWING analysis of HG L1for 9 YSTRs 


Effective population sizes, TMRCA and population expansion times of 


various regions based on BATWING analysis of HG L1 for 17 YSTRs 
Frequency distribution of NRY HG L3*-M357 in India 


Frequency of NRY HG L3*-M357 in various language speakers of 
India 
SSD distances of HG L3*-M357 computed from median haplotype 


among various study states 


SSD distances of HG L3*-M357 computed median haplotype among 


various global populations 
Composition of various populations represented in network 


ASD HG L3*-M357 ASD and BATWING based age estimates in 


different geographical regions 
NRY HG J2a*-M410 frequency distribution in India 
NRY HG J2a*-M410 in various language speakers of India 


HG J2a*-M410 YSTR variance and ASD + SE in different regions 


Table 31: 


Table32: 


Table 33: 


Table 34: 


Table 35: 


Table 36: 


Table 37: 


Table 38: 


Effective population sizes, TMRCA and population expansion times 


based on BATWING analysis of HG J2a*-M410 for 8 YSTRs 
NRY HG J2b-M221/102 frequency distribution in India 


NRY HG J2b-M221/201 ASD based age estimates in different 


geographical regions 


Effective population sizes, TMRCA and population expansion times 


based on BATWING analysis of HG J2b-M221/102 for 8 YSTRs 
NRY HG frequency table among the sub-clans of Nattukotai Chettiar 


Results of Fisher Exact Test (FET) to ascertain the reliability of HG 


frequencies in different sub-clans of Nattukotai Chettiar 


YSTR variance and ASD +SE based age estimate for various HGs 


among study states. 


NRY based AMOVA of various sub groupings for Deccan Indian 


populations 


INTRODUCTION 


Studies on the migration and isolation of selected ancient populations of India- 
a Non Recombinant Y chromosome (NRY) study 
The Deccan and the Dravidian regions 
1. Introduction 

The Indian subcontinent is the Southern extension of mainland Asia, mostly 
formed by the Indian Plate protruding into the Indian Ocean. It is comprised of 
present day India, Pakistan and Bangladesh. This peninsular region measures a vast 
area of 4.4 million km” and is delineated by mountains such as Himalayas in the 
north, the Hindu Kush in the west, and the Arakanese in the east. It is surrounded by 
Indian Ocean on the south, Arabian Sea to the west and the Bay of Bengal to the east 
(Fig 1). In addition to these barriers, the perennial rivers of India, all originating in 
Himalayas such as Indus, Ganges and Brahmaputra restricted population movements 
in the subcontinent. This left the coastal lines as a high way for easy movements 
during early days, both by animals and man. The first coastal migration of modern 
man from Africa to Australia was through the coasts of India (Wells et al., 
2001;Wells and Read, 2002; Henn et al., 2012). 

The arrival of modern Man has been traced back to 75,000 Ybp with the first 
migration of Man (Homo sapiens sapiens) with earlier hominids including Homo 
erectus 500,000 Ybp (Bongard-Levin,1979). Isolated remains of Homo erectus in 
Hathnora in the Narmada Valley in Central India suggested early occupation since the 
Middle Pleistocene era (500,000 and 200,000 Ybp). The archaeological site, Soan 
River valley in the Sivalik region contains Paleolithic hominid remains (Rendell, 
1989). Inspite of serious efforts, the remains of modern human (Homo sapiens 
sapiens) are yet to be discovered in India; though the caves of Sri Lanka (Pahiyangala 


in Bulathsinhala) has yielded the oldest complete anatomically modern human 


Figure 1: Topographic map of India 


“vt 86°E 


whos ee Oey ae . 
Is i Pipl ones as a le Sami 
AD Sen 


Rags | a 


wat | Yi eee ae 


Rivers 
cane 
a 
oT 


skeleton within south Asia. Earlier, two crania were discovered here and dated to 
37,000 Ybp (Deraniyagala and Siran,1996). The antiquity of South Asia thus is not 
new, animal and human habitations are known from time immemorial. 

The present day modern Republic of India is the seventh-largest country by 
area, the second-most populous country with over 1.2 billion people, living in 28 
states and 7 Union Territories. These states are very diverse in terms of their culture 
and language. Infact the states were carved on linguistic basis during post- 
independent India in 1956, on the grounds that people speaking a language share a 
culture (The State recognition Act, 1956). The diversity of her habits and habitats host 
many ancient fauna and flora leading to many mega biodiversity wildlife and 
protected habitats as on date. Of these the Western and Eastern Ghat mountains, 
Vindhyas and the ranges in the Northeast possessed ideal climatic conditions and 
altitude for bio-diversity to evolve and settle, thus making these terrain a haven for 
most of the tribes of India. As on date mankind in India is grossly divided into castes 
and tribes. The tribes constitute 8% of the total population of the country. They are 
represented in 745 scheduled tribes, numbering 84.51 million (2001 Census). They 
occupy 15% of the country’s area. Whether these tribes were the ancient settlers/or 
indigenous to India are big questions to answer. 

The question of ancient settlers and autochthonous origin / evolution of people 
in India have been an area of interest to many scientists, linguists and archaeologists. 
If one argues that tribes represent the earliest settlers in this land, this raises the 
questions, whether all the tribes all over India were the same in terms of their origin 
and evolution? Also whether the castes and tribes could have been derived from the 
same initial settlers? The issue is compounded when we consider the whole of India, 


particularly Northern and Northwestern India, because of the historical invasions that 


tend to mask the social fabrique. The famous Indus valley urban civilization (3300- 
1300 BCE; mature period 2600-1900 BCE) was a gate way to modern India. This 
extended from east of the Ghaggar-Hakra River valley (Possehl, 1990) until the upper 
reaches Ganges- Yamuna Doab (Leshnik and Junghans, 1968). It further extends from 
west Makran coast of Balochistan, via north and northeastern Afghanistan to south of 
Daimabad in Maharashtra. The civilization was spread over some 1,260,000 km?, 
making it the largest ancient civilization that subsisted on barley and wheat (Weber, 
1991). There on the developments in the Indo-Gangetic Doab and the various 
migrations and invasions also need to be taken into cognizance in investigating the 
genetic history of India. To this, considering the geography of India mentioned in the 
early part of this introduction and the large number of ancient and historic migrations 
that have occurred through various passes of Hindu Kush ranges and sea route along 
the Western coast and to a lesser extent the northeast range need to be considered as 
confounders. 

Modern science employing DNA technologies has attempted to unravel the 
mysteries of the origin of these Indian castes and tribes alike. Caste system in India is 
a unique social institution characterized by strict inbreeding and endogamy 
(Trautmann, 1981). It is generally believed that the settled agriculture lead to 
sedentary life, land holding and patriliny of inheritance as confirmed by NRY 
chromosomal study (Chaix et al., 2007). In Tamil Nadu until the early Chola period 
the lands belonged to the state and only with the advent of wet land irrigation the 
practice of land holding and male hegemony were established (Sastri, 1975). More 
recent study with the sample size of 1,680 Y chromosomes representing 12 tribal and 
19 non-tribal (caste) endogamous populations from Tamil Nadu suggested that the 


population differentiation particularly the male lineages of this region correlated with 


agricultural expansions predating the varna system (ArunKumar et al., 2012) where 
the mode of subsistence is a major factor in determining the structure of the 
populations. These populations share their genetic heritage dated back to the late 
Pleistocene (10-30 Kya). The coalescence analysis has suggested the establishment 
of social stratification, 4-6 Kya itself and little admixture during the last 3 Kya. The 
study also brought out that this genetic structure was not influenced by the later 
introduced Varna system, as documented by the Brahmin migrations into the area. 
The overall Y-chromosomal patterns correlating with the time-depth of population 
diversifications and the period of differentiation were best explained by the 
emergence of agricultural technology in south Asia as described by Fuller (2007). 
Many of the genomic studies on Indian populations were carried out on 
available samples and no in-depth consideration was given to demographic profile, 
cultural anthropology of the populations studied and sample collection. Further, many 
of these studies considered different strategies in clustering of population’s groupings 
and interpretations, and many times very low sample size were considered. For 
example some study considered Brahmin as a single entity, Brahmin as Dravidian 
language speaker based on the language they speak as of now (Sengupta et al., 2006), 
considering Chenchu as a tribe and based on their M17 clustering with Brahmin in the 
tree interpreted them as having a common origin (Kivisild et al., 2003). An Earlier 
symposium on peopling of India convened by Balasubramanian and Rao (1998) threw 
some glimpses into the subject and on the issues on hand and its importance in 
studying the ‘peopling of India’. Some of the problems with all these studies were to 
investigate in piece meal and trying to draw an over arching conclusion. The 
geophysical barriers, the subsistence pattern, geographical expanse, languages and 


their cultural characteristics all would play key roles in all population genetics 


phenomena such as founder effect, isolation, expansion and dispersal; and all these 
need to be considered holistically. This was the aim of The Genographic project a 
global non profit, non patenting research project funded by National Geogrpahic, IBM 
and The Ted Waitt Family foundation, USA, in which I am a part of the India - 
investigator team that lead to this thesis. 

Mankind is a story of migration. Most of the studies employing uniparental 
markers conform to the coastal route. Earlier studies from this laboratory have 
investigated the Y chromosomes of Tamil Nadu and Kerala by Kavitha (2008). 
Similarly Y chromosome and mtDNA analysis was carried out on populations of 
Orissa and North-East of India (ArunKumar, 2012). These studies have given a clear 
picture of the NRY HG and STR profile of these regions and their correlation to 
subsistence and languages. It was thus of interest to study the other states of the 
Deccan (Karnataka and Andhra Pradesh) and the states on the ‘coastal high way’ to 
India — Maharashtra and Gujarat. Thus studying the people of the Deccan and the 
Southwestern India will throw light on the question of early coastal route entry of the 
people, settlement and expansion. The specific questions thus asked were: 

if What are the parameters that determine the NRY genetic structuring 

of ancient populations of using Gujarat, Maharashtra, Karnataka and 
Andhra Pradesh, 

li. Whether the Tribes and castes, or various tribes and various castes 

among them, of these regions had a common origin? 

ill. Can we explain Dravidian, by studying the populations of Deccan? 

Can its gene pool or language be equated to any of the NRY as 


suggested by Sengupta et al.,(2006) 


iv. What is the distribution pattern of HG J clades that are sporadically 
seen in Indian populations? Does it correlate with any technological 
advancement or culture? 

v. What is the mechanism of caste formation in India? Can it be 
answered by studying indepth a well defined caste such as 
Nattukottai Chettiar? 

Many surprises were thus in store: the study confirmed that the population 
structure of various states was determined by different factors and no unique model 
can be proposed to answer the peopling of the study states. Each state, geographic 
region, language speakers thus need to be considered individually and as a single 
entity to understand various population genetic mechanisms and confounders that lead 
to the present scenario. Unraveling these parameters is essential to deconstruct the 
history of contemporary populations. This thesis has attempted to study the above 


mentioned aspects. 


REVIEW OF 
LITERATURE 


2. REVIEW OF LITERATURE 


Human genetics is the study of inheritance of human variations. From the days 
of Gregor Mendel until now, with the advent of genomic era and the outcome of 
Human Genome Project, the identifiable markers have been shifted from pea pods and 
pod colours to SNPs and STRs (Lander et al., 2001). The genome sequences and its 
variations in the population have given ample scope to study the genes and loci of 
medical and evolutionary significance. From evolutionary perspective, the genes and 
genomes have helped in understanding the origin and dispersal of various populations. 
Human population geneticists and anthropologist / archaeologists seized the occasions 
to employ these modern tools to decipher how human populations in various 
continents, countries and regions of the world have diverged from one another and 
how, where and when the origin of our species occurred. Thus the Genomic era was 
ushered by many great discoveries and the development of various technologies like, 

a. Discovery of Polymerase chain reaction (PCR) (Mullis, 1990) 

b. Automated sequencing using flurophores (Olsvik et al., 1993) 

c. Cataloguing the whole human genome sequence as an outcome of Human 
Genome Project and its availability in the public domain (Lander et al., 2001). 

d. Discovery of many evolutionarily significant uniparental markers that are 
unidirectional such as mtDNA and NRY (Cann et al., 1987; Underhill et al., 
1997). 

All these made it possible to have more exact insight into human genetics in 

various contexts. 

In the present study, I have made an attempt to study migrations and isolation 
patterns of selected Indian populations using Y chromosome as a tool. I have selected 


Gujarat and Deccan (Maharashtra, Karnataka and Andhra Pradesh) states to 


investigate the same. In order to better understand the conceptual frame work of the 
subject concerned and to define the origin, expansion and migration of populations in 
the study regions, it has become essential to understand the terrain, archaeology, 
culture and language of the study region. I present a concise review of them and then 
the subject matter. 


2.1 The Land and its People: 


Indian Peninsula is located at the junction of three continents, viz Africa, 
Europe and Asia. Therefore India played a crucial role in housing and dispersal of 
early modern humans, thus enhancing its cultural, linguistic and genetic diversity. To 
address the genetic diversity of the contemporary Indian populations and relate these 
patterns to cultural, linguistic and demographic histories of the people is of great 
importance (Majumder, 1998). The early human habituations in India from 
archaeological, language and genetic study are given below: 


2.1.1 Archaeological time scale of Indian people: 


2.1.1.1 Late Pleistocene and first cultures in India: 


Late Pleistocene (ca 250,000-10,000 years ago) has played a major role in the 
history of South Asia. The earliest known civilization was based on stone tool 
evidences on the banks of Sohan River in the Siwalik Hills and Rawalpind of Pakistan 
(Terra and Paterson, 1939). This is named as Sohainian culture. Following this is the 
Alchelulian culture. This extended from north of Siwalik to Madras (Misra, 1987). 
Tools of Alchelulian culture in Maharashtra and Karnataka has been dated to 
350,000Ybp (Mishra, 1992). Animal hunting was the main occupation. Chopping 


tools, cleavers, scrappers, blades and cores were used for hunting. 


2.1.1.2 Middle and Upper Palaeolithic age in Deccan India: 


Middle Palaeolithic age (ca 20,000 - 42,000 Ybp) relates to the Neanderthal 
remains in Europe, but such evidences have not been found in India. However, stone 
tool evidences have been found in Narmada river basin (Khatri, 1962), Chota Nagpur 
Plateau (Ghosh, 1970), Deccan plateau (Sankalia, 1956) and Eastern Ghats (Murty, 
1966). Traps, nets and snares were probably used during this period. The Upper 
Palaeolithic (ca 32,000 — 14,800 Ybp) tools used by the people of Central Indian and 
Eastern Ghats were bored stones that resemble the net sinkers used by current day 
Yanadi tribe (Andhra Pradesh), nets of Voda Balija (Andhra Pradesh) and other 
fishing communities. Therefore, food procurement during Upper Palaeolithic must 
have been based on aquatic systems. The current study includes the populations from 
Deccan plateau and Eastern Ghats. 
2.1.1.3 Mesolithic and technology advancement: 

The Mesolithic age in India is marked by the microlithic tools, making of 
gums and use of bow and arrow (Wakankar and Brooks, 1976). Bifacial points made 
by pressure flaking are a characteristic feature of the Mesolithic industries of coastal 
dunes of Southern Tamil Nadu (Zeuner and Allchin, 1956) and Sri Lanka. The first 
colonisation of Ganga plains started in this period (Sharma et al., 1980). The nomadic 
lifestyle was reduced to seasonal sedentary life. Disposal of dead (burial grounds) in 
extended and crouched position comes from this period. Man domesticated animals 
such as dog, sheep, goat and cattle. First cultivated plants were wheat and barley. Rice 
cultivation and pig domestication started in Middle Ganga via China. Jerreru valley in 
Southern India showed the technological advances in baking of microblades to give 


forms to it (Petraglia et al., 2007). 


2.1.1.4 Neolithic and the Indus Valley civilization: 


Neolithic ages in India are based on agriculture (8000 — 7000 BC). Mehrgarh 
is the oldest known agricultural settlement in India (Jarrige, 1986). This is located at 
the banks of river Bolan (a tributary of Indus). The history of this site can be divided 
into 8 periods (Willey et al., 2001). Period I-V is marked by polished tools, long 
distance trade for beads and pottery. Terracota human figurines appear in this era. 
Cotton seeds and new breed of barley were identified. Use of timber was also seen in 
the Neolithic sites. Polychrome pottery developed. Period VI showed Pipal leaf and 
humped bull designs, female deity and Shiva that were worshipped. Swastik symbols 
were identified. This culture migrated to Nausharo in third millennium BC (Jarrige, 
1990). 

The population explosion at Baluchistan had forced the people to move in to 
other regions of Indus valley and present dried Ghaggar-Hakra River in fourth 
millennium BC. They also spread to Gujarat, northwest of Rajasthan, Punjab, 
Haryana, west UP, Pakistan and Southern Afghanistan. Scholars believe that the 
Ghaggar-Hakra River was the sacred Saraswati River then, which has been eulogised 
in Rigveda. The change in the directionality of Yamuna and Sutlej rivers could have 
been the reason for the drying of Saraswati River. This eventually led to evacuating 
the sites of Indus to other areas. The other explanations for abandonment of these sites 
could be reduction in rainfall, foreign invasions and environmental degradation due to 
excessive use of soil and plant resources (Saraswat, 1993). Populations that have an 
oral migratory history from Saraswati River basin have been sampled for this study. 

Agricultural crops of Harappans were mainly wheat and barley in Indus; 
millets (bajra, ragi, little millet, Italian millet) in Gujarat. Rice was added as they 


came in contact with civilization in Ganga plains (Weber, 1991) by ca 2500BC 


10 


(Fuller, 2007). Coastal trades were established in Persian Gulf countries. The social 
and economic stratification appears in the Harappan society with ruling classes, 
farmers, artisan, traders, priests and workers. The dead were buried. The study of 
Indus script remains ambiguous to date. These scripts are pictographic signs on seals 
and tablets. It was written right to left and in some cases boustrophedonically 
(Parpola, 1994; Possehl, 1996). Parpola (1994) suggested that the Harappan 
inscriptions are mainly Dravidian. Elamites (proto Dravidian language macro family) 
and Dravidians have shown to be highly related (McAlpin, 1981) in terms of their 
vocabulary. 


2.1.1.5 Neolithic farming and traditions in North and North east India: 


The farming communities emerged outside the Indus and Harappan sites. The 
Neolithic sites were restricted to Kashmir valley, north Vindhya, Ganga valley, south 
India, east and northeast India. The tools developed in Kashmir valley were unique of 
its kind and was found only in north China Neolithic sites. The main crops were 
winter wheat, barley, lentil and peas (Kajale, 1991; Lone et al., 1993) which were 
probably derived from Near east. Animals like cattle, goat and sheep were 
domesticated. Hand and wheel made pottery developed as those similar to Pakistan. 

Neolithic in north Vindhyas and Ganga valley were crucial as it was meeting 
ground for Indo-European (IE), Austro-Asiatic (AA) and Dravidian (DR) speakers 
(Misra, 2001). The communities in Vindhyas mainly practised shifting or plough 
cultivation. The Ganga plains with alluvial soil supported agriculture by 3000BC 
(Costantini, 1987; Misra, 2001). Hunting groups assimilated themselves into 
agriculture based society. Rice, millets, monsoon pulses and winter crops were grown. 
At the end of third millennium or second millennium BC, east India grew wild 


varieties of rice, millets, pigeon pea, lentils and pulses (Fuller, 2007). Artefacts 


11 


related to fishing are also seen. Northeast India cultivated yams and taros. The AA 
speakers rose wooden memorials for the dead. 
2.1.1.6 Neolithic and farming in Deccan: 

In Sourashtra, agriculture was mainly dominated by millets native to India. By 
ca 2000-1700 BC, crops from Africa such as sorghum, pearl millet and finger millets 
were introduced. Pulses and legumes were also chiefly crops. The static frontier in 
which agricultural groups interact with hunting-gatherer groups for trade is best 
inferred for Gujarat and Rajasthan (Fuller, 2006; Fuller, 2007). 

Neolithic sites in south India are found in north Karnataka, west Andhra 
Pradesh and north Tamil Nadu. Many of them occur on the flat tops, slopes and foot 
of granitic hills but some are also found on the alluvial banks of rivers like the 
Godavari, Krishna, Penneru, Tungabhadra and Kaveri (Paddayya, 1973; Murty, 
1989). In south, Neolithic age is marked by ash mounds. The economy was chiefly 
agro-pastoral. Few non-ash mound sites have been identified in south Karnataka 
(Fuller, 2006). The ethno historical data suggests that animal herd keeping, the chief 
occupation among populations such as Gollas of Andhra Pradesh, Kurubas of 
Karnataka and Dhangars of Maharashtra, started during this age (Murty, 1989). These 
populations have been included in this study. In third century B.C the earliest 
evidence of writing in Tamil Sangam literature were found in South India. Burial was 
the means to dispose the dead. 
2.1.1.7 Chalcolithic age in Deccan: 

In India, Chalcolithic evolved parallel to Neolithic age. Copper-bronze 
discovery led to the improvement of tools, weapons, ornaments, architecture and 
pottery. The Chalcolithic cultures were found in west and central India, Rajasthan, 


Malwa, Vindhyas and Ganga valley. The Northern Deccan or Western Maharashtra, 


12 


particularly the semi-arid region to the east of the Sahayadris has provided the best 
evidence of the Chalcolithic cultures in India (Dhavalikar, 1997). The Malwa culture 
(movement of Malwa people from Central India with agriculture) and Jorwa culture 
(agricultural colonisation) are characteristic features of Chalcolithic Maharashtra. 
Buddhism and Jainism prevailed during this period. The first Indian empire — 
Magadha arose. With the introduction of Iron, the cultural development shifted to 
south India. Black and red ware pottery and painted grey ware (PGW) are the main 
features of Iron Age (Agrawala, 1989). PGW were first found at Ahicchatra in 
Bareilly district of Uttar Pradesh (Ghosh and Panigrahi, 1946). Populations with the 
oral history of migrating from this region have been sampled for the study. In the first 
millennium BC, Megalithic culture developed. 
2.2 Origin/Spread of People and languages in India- evidences from linguistic 
studies: 

In India there are four main language families with strong regional affiliations. 
These language families are IE, DR, AA and Tibeto-Burman (TB).The present study 
includes populations from IE, DR and a few AA speaking populations to test its 
correlation with NRY genes and migratory patterns of my study populations. The 
hypotheses made by other linguistic and genetic based studies are mentioned below: 

Two theories have been proposed on the spread of these languages namely the 
“wave model” (Ammerman, 1984) and “elite dominance” (Renfrew, 1988) model. 
According to wave model, agricultural surplus produced lead to increase in population 
density over hunter-gatherer community. This mechanism took the languages and 
genes into other areas. In ‘elite dominance’ model, language of a small invading 
group is adopted by a large resident population. This language shift occurs either by 


force or owing to its social advantages. 


13 


2.2.1 Indo-European languages: 


There are several theories on the spread of Indo-European languages into 
India. Anatolian theory claims that IE languages spread from Anatolia with 
agriculture ~8000-9500 years ago (Gray and Atkinson, 2003; Bouckaert et al., 2012). 
Kurgan theory explains that the warriors of north Black sea invaded Europe between 
4300-2800 BC and imposed their language on Europeans (Mallory, 1989). The 
astronomical reference in Vedic literature shows the presence of IE speakers in fourth 
millennium BC or earlier, thus making India another probable homeland of IE 
speakers. The spread of IE speakers into south India has been associated with settled 
agriculture and irrigation technologies (Sastri, 1975). 

Y chromosome studies state that the caste populations in India are mainly 
derived from Indo European speakers who migrated from Central Asian origin 
~3.5kya (Cordaux, 2004). The presence of Y HG R-M17 in a frequency of 40% of in 
caste populations of India and Central Asia with a relatively low frequency (9%) in 
tribal populations supports this view. Whereas, studies by Sharma et al., (2009) 
suggested autochthonous origin of this HG. 


2.2.2 Dravidian languages: 


Robert Caldwell was the first to use the word ‘Dravidian’ in 1856 and to 
propose an autochthonous origin of Dravidian, contrary to the widely held view of 
Sanskrit origin of Dravidian languages. He also has suggested an affinity of Dravidian 
languages to Scythian languages. Different school of thoughts on the origin and 
spread of Dravidian languages exist these are: 

(i) Proto-Ealamo Dravidian languages spoken by Elam carried their language and 
agricultural technologies from Zargos Mountains in south western Iran to India 


(Menozzi et al., 1994; Renfrew, 1996). This theory proposed that the Central Asian 


14 


pastoral nomads moving with IE language to Iran, Pakistan and north India ~4000 
Ybp by elite dominance process (Renfrew, 1988). 

ii) Based on Lexical reconstructions of flora, place names, modern language 
geography and archaeological evidences, Parpola, (1994) provided evidences for 
proto-Dravidian origin in the Indus region. Brahui, a Dravidian speaking isolate in 
present Pakistan is known to have migrated from north Dravidian region in Central 
India in the past millennium as evidenced by their vocabuary (Elfenbein, 1987; Fuller, 
2007). Alternatively it is also hypothesised that Brahui is the remnanat of the 
widespread Dravidian language that was eventually replaced by the influx of IE 
speakers into India. 

(iii) Scientists working with genomic tools have proposed an autochthonous origin of 
Dravidian speakers in Southern India (Sengupta et al., 2006). 

2.2.3 Austro Asiatic language: 

The AA languages are mainly divided in to two branches namely: Munda and 
Mon-khmer. Archaeological evidences such as rice domestication and linguistic 
evidences support the southeast Asian origin (“Britannica Online Encyclopedia,” 
2012; Diamond and Bellwood, 2003). Language studies based by Witzel (2005) 
proves east India as the birth place for AA speakers. Genetic study by Basu et al, 
(2003) and Majumder, (2001) show that AA speakers had indigenous origin in India. 


2.2.4 Tibeto Burman languages: 


The TB speaking populations mainly occupy the north east region and 
Himalayas of India. Archaeological records state that these populations arose 5000- 
6000 years ago (Guha, 1936). The age estimate is consistent with the Y STR data of 
Su et al., (2000) from Yellow river basin, China. TB speakers probably entered India 


through multiple routes along the Himalayas carrying YAP lineages into India 


15 


(Sahoo et al., 2006). TB speakers migrated with the female population from Burma 
with Naga-Kuki-Chin languages and Y HG O3¢ into the subcontinent. 
2.3 Castes and tribes of India 

The “Caste System” in India is a unique phenomena characterised by inbreeding 
and endogamy. Their origins remain highly controversial to date. Ethnographic and 
genetic evidence both support that castes system in India have been highly 
endogamous for a considerable length of time (Karve, 1968; Bhasin and Shampa, 
1994).They are mainly associated with agriculture and found concentrated in the 
alluvial and the coastal plains of the country. Whereas the tribes who constitute 8% of 
the total Indian population occupy mostly the hilly and the forested tracts (Bhasin, 
2006). 

Kivisild et al., (2003) reported that tribes and castes share considerable 
Pleistocene heritage, with limited recent gene flow between them, whereas Cordaux, 
(2004) observed that caste and tribes may have independent origins. 

But the issues concerning the antiquity and past genetic history of the tribal 
populations and the confounding influences of region, language, and ethnicity have 
remained elusive (Krithika et al., 2009). There study further addressed the issue by 
proposing different models: 

a. The derived tribes had retained their common population name and language 

from the early settlers. 

b. The derived sub-tribes had retained a common ancestry but acquired different 

languages. 

c. Sub-populations derived from (two) different ancestry had retained their 


separate ethnicity but adopted a common language 


16 


Hence this gives the possibility that different tribes or caste 
populations may have same or different origins. Indepth analysis of 
ethnographic details, language shifts along with the genetic data also need to 
be considered to decipher their origins and antiquity. 

2.4 NRY chromosome as a tool to study Population histories: 

Several biological markers such as blood protein polymorphism, HLA, 
mitochondrial DNA and Y chromosome have been used to infer population histories. 
Of these Non Recombinant Y (NRY) chromosome is the best suited for deciphering 
the migratory patterns of male lineages for various reasons. 

The Y chromosome is one of the smallest chromosomes in the human genome 
(~60Mb) and represents 2-3% of a haploid genome evolving from a pair of autosomes 
around 300 million years ago (Lahn et al., 2001). Y chromosome determines the 
human sex and maintains the male germ cell. 95% of the Y chromosome is Non 
Recombining (NRY). It consists of several repetitive DNA sequences. These repeat 
sequences are organised as palindromes with two very long similar sequences 
pointing in opposite directions and joined by spacer (Charlesworth, 2003). 


2.4.1 Structure of Y chromosome: 


Chromosome banding techniques have revealed three main regions on the Y 
chromosome namely the pseudoautosomal (PAR) portion, euchromatin and 
heterochromatin regions. The PAR is divided into two i.e., PARI and PAR2. Fig 2 
gives the schematic representation of Y chromosome. PAR 1| is located at the terminal 
region of short arm (Yp) whereas PAR2 is located at the tip of long arm (Yq) which 
spans approximately 2660 to 320kb of DNA respectively. PARI exchanges its genetic 
material with the X chromosome during meiosis. Deletion of PAR | results in male 


sterility and failure of pairing during meiosis (Helena Mangs and Morris, 2007). 


17 


Figure 2: Schematic representation of Y chromosome showing the position of 
YSTR markers employed in this study 


Wei1.32 PY PAR1 


Weii.31 
_ DYS393 
Y¥p11.2 
DYS456 
Yreii.2 
DYS458 
DYS19 
Wei1i.1 —&S : Centromere 
VYaili.i SS DYS391 
Waii.e21 
— 2 DYS437 y, q11.21 
Vqi1.221 = DYS439 
= 
Weaii.222 _ 3 DYS438 
¥q11.223 = DYS388 
fad DYS426 
Waii.23 = adel 
ZZ DYS385 a/b 
7 DYS392 
Z DYS448 
vai2 Z 
Z DYS635 
Z YGATAH4 
PAR2 ZZ Heterochromatin 


Distal to PAR is the euchromatin region spanning 23Mb of the male specific Y 
(MSY) (Skaletsky et al., 2003). It has three main groups of genes: X transposed 
region, X-degenerate region and Amplicon region. The X transposed region is mainly 
populated by Alu, retroviral and Long Interspersed Elements1(LINE 1). Alu markers 
find their application in population genetic studies. The X degenerate region and 
Amplicon region are responsible for maintaining the normal biological functions of Y 
chromosome. 

Distal Yq region is the heterochromatin region contains numerous highly 
repetitive DNA sequences and also genes responsible for the biological functioning of 
Y chromosome. They also house several genetic markers used in population genetic 
study (de Carvalho and Santos, 2005) because they house highly repetitive sequences 
DYZ1 and DYZ2. 

2.4.2 Mutations of interest on Y chromosome: 

There are two types of mutations on the non coding regions of Y chromosome 
that accumulate in course of time and are stable. These mutations include biallelic 
polymorphism or unique event polymorphisms (UEPs) and Short Tandem Repeats 
(STRs) (Jobling and Tyler-Smith, 2003). 

The biallelic polymorphisms are slow mutating markers representing single 
nucleotide polymorphism (SNP), insertion/deletions (indels) and Long Interspersed 
Elements (LINE) insertions. The first biallelic marker to be identified was Alu 
insertions (YAP) present in majority of African populations but absent in European 
populations. Y haplogroup can be defined as all the male descendants of a single 
person who first showed a particular SNP mutation. They characterise the migration 


of population groups. 


18 


The long arm of Y chromosome contains large interspersed tandemly 
repeated arrays called Y STRs. These are also called microsatellites. Mutation in the 
microsatellites occurs due to polymerase slippage during DNA replication (Lodish, 
2000) . The chromosomal mapping of these STRs on Y chromosomes that are 
included in this study is shown in Fig. 2. 

DYS19 was the first identified polymorphic Y marker. A core of YSTR 
markers referred to as minimal haplotype includes DYS19, DYS389I/II, DYS390, 
DYS391, DYS392, DYS393, and DYS385 a/b (Butler, 2001; Kayser et al., 1997; 
Roewer et al., 2001). Three models have been proposed for the origin of diversity in 
Y STRs. Firstly, the “stepwise mutation model” (SMM), wherein the mutation events 
involve gain or loss of one repeat unit resulting in expansion or contraction 
respectively (Ota and Kimura, 1973). The product of the mutation is often an already 
existing allele. This implies that the two alleles had a common ancestor. It is the most 
preferred model in calculating genetic relatedness between individuals or populations. 
However this model has a drawback of homoplasy - the phenomena when two alleles 
are identical in state and not identical by descent, leading to underestimation of 
divergence. The second model “Infinite allele model” (IAM), which states that every 
mutation generates a new allele. A particular locus is same in two different 
individuals if no mutations have occurred. The probability that these two individuals 
had the same ancestor is Exp[-2ut] where p is the assumed mutation rate and t is the 
time in generations. This model employs scoring the loci as match or no match. The 
disadvantage of this model is that it could underestimate TMRCA because of the risk 
of undercounting total number of mutations. Thirdly, the “K allele model”, that states 


that microsatellites can mutate to “K” alleles randomly. 


19 


2.4.3 Mutation rates: 


In population genetics studies, dating of the Y lineages and demographic 
events are based on the knowledge of mutation rates. Mutations on Y chromosome are 
random and obey first-order kinetics. The genetic diversity of a loci is a function of 
mutation rate and effective population size (Burgarella and Navascués, 2011). 
Absolute mutation rates have been estimated by pedigree analysis or by Y 
chromosome microsatellite variation within a Y HG. 

In pedigree analysis, direct count of deep root lineages yielded a mutation rate 
of 2x10° per generation (Heyer et al., 1997). Study by Kayser et al., (2000) on 
father/son pairs yielded a mutation rate of 3x10” per generation. A study of 18,000 
DNA sequences from sperm cells showed a mutation rate of 2x10° for YSTRs, 
DYS19 and DYS390 (Holtkemper et al., 2001). Pedigree based mutation calculations 
are based on per-meiosis. 

Counting the number of mutations in the branches of median network of 
native American population by (Forster et al., 2000a)), identified the difference 
between evolutionary and pedigree mutation rates. Pedigree based age estimates gave 
a lower age estimates. The discrepancy between these two age estimates could be due 
to the use of fast type markers and the age of the samples used in pedigree analysis ( 
>30 years). Mutation rates are known to increase with age. The current day paternity 
age may not reflect the prehistoric fathers. 

Zhivotovsky et al., (2004) estimated the effective mutation rate using the 
YSTR data within a Y HG. The value was found to be 6.9x10™ per 25 years. This 
value was used to estimate the expansion times of African Bantu population, 
divergence of Polynesian populations and origin of Gypsy population in Bulgaria. 


Evolutionary mutation rates are based on current variation in microsatellites that has 


20 


been influenced by reverse mutations of old alleles and forward mutations to new 
alleles. This mutation rate has been widely used in evolutionary studies including the 
present study. 
2.5 Nomenclature system for YSNPs and YSTRs 

With the increase in the number of binary markers discovery, different 
systems have been used to name these SNPs. There are seven different types of 
nomenclatures (Jobling and Tyler-Smith, 2000; Hammer et al., 2001; Underhill et al., 
2000; Karafet et al., 2001; Semino et al., 2000; Su et al., 1999; Capelli et al., 2001). 
Therefore to develop a uniform method of naming these SNP, Y chromosome 
consortium laid down the regulations in 2002 (Consortium, 2002). The combination 
of binary polymorphism yielded a phylogenetic tree based on maximum parsimony 
method. Capital letters A-R were used to identify 18 major clades. This is followed by 
the name of the terminal mutation that defines the haplogroup (HG). Lineages that 
were not defined by derived mutation were placed at the interior nodes of the tree. 
These are referred as paragroups and is indicated by the symbol * (Karafet et al., 
2008). Subsequently, Jobling and Tyler-Smith, (2003) revised the YCC 2002 tree to 
include all the markers that were discovered after 2002. Later Karafet et al., (2008) 
revised this nomenclature with clades from A-T. This is the most commonly used 
tree. The International Society of Genetic Geneology (ISOGG), a non-commercial 
organisation formed in 2005 frequently updates the Y chromosome phlogenetic tree. 
The phlylogenetic tree is shown in Fig. 3. 

The Human Genome Organisation nomenclature (HUGO, 2012) laid down the 
standardized YSTR nomenclature regulations which are mentioned below: 


1. All YSTRs are represented by DYS following the name of the STR. 


21 


Figure 3: Phylogenetic tree of Y chromosomal haplogroups 


M91, P97 
M60,M181, PBS, PIO 


YAP. M145=P205. M203, P144.P153, P1465. P167,P183 M174 (+1 
SRY, (+17) E 
RPSAY, M216, P1684, P255, P260 


P91, P104 
M427,M428 


P14, M89 
M213,P133 
P134,P135 


Hho pe P123,P124 125, P126 
P139,P140 P127,P129, P130 


P141,P142 
P145,P 146 
P148,P 149 Mi 


P151,P157 
P158,P159 ps0 
P160,P161 P79 


ra re 
M411,.M20, M22, M61, M185, M1295 
P256 
9 P1282 
P131,PI32 M214,P188,P192 M231 
PA PISS M175 (+3 


G2R7, M45, M74=N12,P27.1 
D:M174,021355 P69, P207, P226, P228, P230 
E:SRY,,., M96, P29,P150,P152,P154,P155, P156,P162 P235,P237,0239 7240, F243 1M 
P'168, P169,P170,P171,P172,P173,P174,P175,P176 P244,P261,P282,P283,P284 |i 147) 
|: P19, M170, P38, M258, P212,U179 P295 
J: 12f2a, M304, P209 M230, P202, P204 


O:M175,P186,P191,P196 
R: M207, M306,P224,P227,P229, P232, P280,P285 Fea 


Adapted from Karafetet al., 2008 


9, 


Alleles should be named based on the number of variant and non-variant 
repeats from sequence data. Single repeat units located next to the main repeat 


motif with the same sequence should be considered as the part of the repeat. 


. Repeat units that are not adjacent to main repeat motif and has less than three 


units with no size variation (in humans or chimpanzee) should not be 
considered for nomenclature. 

Intermediate alleles (eg 11.1) should be represented by the number of 
complete repeat units and the number of bp of the partial sequence separated 
by decimal. 

Intermediate alleles formed by mutations in flanking region that can alter the 
allele length should be represented by the number of full repeat units followed 
by direction and position of the mutation relative to STR. 

Point mutations that affect PCR annealing should be verified by sequencing 
and designations must be used to represent them as per the guidelines 

New sequence variation should adapt to locus delimiting criteria 

Journal editiors, reviewers and organisers should use the standardised 
nomenclatures to assure the uniformity of nomenclature usage 


Commercial Yfiler kits also should follow the standardised nomenclatures. 


2.6 Evolution of NRYHG markers and its global distribution: 


The global distribution and their proposed route of migration of various Y 


haplogroups prevalent in India are shown in fig. 4a-g. Accordingly, selected HG is 


seen in high frequencies in a given geographical region or population. The 


unidirectional evolution of NRY HG has thus facilitated to suggest the human 


occupation in various parts of the world. Summarized below is the existing 


22 


knowledge on this aspect. Table 1a-1f shows the percentage frequencies of various 
haplogroups for 13,779 Y chromosomes from 131 geographical areas. 

2.6.1 HGs with African Origin: 

NRY HG A: defined by M91 mutation is completely restricted to African 
subcontinent (Hammer et al., 2001; Underhill et a/., 2001) and found most frequently 
in Khosian population. The coalescence age of NRY root (Cruciani et al., 2011) was 
estimated to be 142 Kya, thus surmising the earliest common ancestor of all humans 
originating / found in Africa. This mutation is not found anywhere else in the world. 
NRY HG B: This haplogroup is also restricted to Sub-Sahara African continent in 
populations such as Pygmies and Baka (Berniell-Lee et al., 2009). The age of this 
marker is ~50,000-60,000 Ybp. It is the second oldest marker following 
haplogroup A. 

2.6.2 Phylogeography in south Asia 

NRY HG D: This haplogroup is mostly present in Northern and Eastern Asia, 
frequent in Tibet and Japan and is present in lower frequencies among Southeast 
Asians and Andamanese (Shi et al., 2008). The age of expansion of this HG was 
about 60,000 Ybp. Relic distribution of this HG in East Asia is attributed to the spread 
of Han culture and last glacial maximum. 

NRY HG C: Fig 4a shows the distribution of NRY HG C in global populations and 
its migration pattern. This clade is characterised by its first migrants into India (Wells 
et al., 2001). Paragroup C*-M28 is higher in East Asia but also distributed in other 
regions in low frequencies. Japanese are specific to YHG C1-M8. Y haplogroup C3- 
M217 and its subtypes are extensively distributed in East Asia, Central Asia and 
Siberia. This clade is considered to have a Mangolian origin (Zhong et al., 2010). The 


subtypes of NRY HG C3 show north to south cline. NRY HG C5-M356 specific 


23 


Table -1a: Percentage frequencies of NRY HG C and its subclades with increasing longitude 
(West to East) 


16.83 |-148.37 [Polynesia |_|. 0.2] 140.3] | | | | | tT 
[54.53 [105.26 [North America [12.7] | [| | [| [ [| | [| [ | 
30.05 [31.25 [Egypt sft] =| OT CT CT CT CT CT CT CT 
33.85 [35.86 [Lebanon [0.16] | =| | CT CT CT CT CT CT 
34.02 [3618 [Bekaa | Stat] OT CT CT CT CO CC 
28.36 [66.53 [South Pakistan | | | [| | [| [ [| | [| [ [| 
33.94 [67.71 [Afghanistan | | | Test OT OT COT OT CT 
34.79 [71.41 [North Pakistan | | | | | 14] [| [| | [| [ Jf 
31.16 [75.65 [North India_—s | ~~ |_ S|] S| CT CT CT CT CC 
19.75___|75.71__ [West India 13.82] | | | TT 
[12.58 [78.26 [South India__—[4.48fo.os] [| [| | [ 7 [| [ JT 7 | 
22.80 [86.30 __——[EastIndia_ | —.8o] | S| CT CT CT CT CT CT 
43.79 [87.63 [Xinjiang | S| S| T2927 | 0.99] 0.99/3.96] | 
29.65 |91i2 [Xizang | S| CT 3868] Tt 58] 
25.33 [91.27 [NortheastIndia | [0.26] | [| [ [ | [| [ [ J | 
36.62 [101.78 [Qinghai | S| SC] TS tstt OT OT CT CT CT 
25.05 [102.71 [Yunnan 11.3 | | 506] | TT 89 
36.06 [103.83 [Gansu | S| CT CT Ct oT OT CT CT CT 
30.65 [104.08 [Sichuan | S| SC] CT 7] OT OT CT CT CT 
35.86 [104.20 [China | fs] | | CT CT fos} OT OT 
[60.00 [105.00 __—[Siberia_ | S| SC] CT CT CT Tt TT 
38.47 [106.26 [Ningxia | S| SC] Tt] OT TT 4 55] 
29.56 [106.55 [Chongging | | | ST Pst oT UT CT CT CT 
26.60 [106.71 [Guizhou 5.71] | | sof | Tf 429f 
22.82 [108.33 [Guangxi 20.7{_ | | 7.32] | OT CT CT CT 
34.27 [108.95 [Shaanxi | | S| | CT S| UT CT CT CT TT 
20.02 [110.35 [Hainan [30.27 | OT CT CT CT CT CT CT CT 
40.82 [111.77 ___[Neimenggu | | | S| 43.8] | 43.8] fs] 
gti [112.98 [Hunan | | CT 3.3] OT OT CT CT 
23.13 [113.27 [Guangdong 1] | | fo67] | CT CT CT CT 
34.77 [113.69 [Henan | ST CT T3647 OT OT CT OT CT 
0.79 [113.92 [Indonesia _—[7.92{ | | | | [ [| | | [ Tf 
30.55 [114.34 [Bubei_— | | TO to 
36.67 [117.02 [Shandong | =| | | Ct 5 | OT CT 625] 
29.79 [its.26 [Anhui | CT CT CT Test TT OT OT 
32.06 [118.76 Jiangsu | S| CT CT Ct oo] CT CT CT CT CT 
26.10 [119.30 [Fujian | S| CT Tt too] Sf] OT CT CT CT 
30.27 [120.15 [Zhejiang | S| S| CT tt OT CT CT CT CT 
23.70 [120.96 [Taiwan | to T OT CT CT CT CT CT CT CT 
31.23 [121.47 [Shanghai [143] | S| St | OT CT CT CT CT 
41.84 [123.43 [Liaoning | | SC] Ts] Td s6fi2st | 
43.90 [125.33 [Jilin | CT CT 37] tS 
[45.74 [126.66 [Heilongjiang [| [| | | [707] [ | [976]24a] [| 
35.91 [127.77 [Korea | S| SC] | 83.3f | CT CT CT CT 
25.27 [133.77__— [Australia | [5.56] | | | | | | | [602] 
36.20 [138.25 Japan| S| ooo] | | 303] | | CT 
2 OC (CR 
-8.19 [152.83 [Melanesia _—s | — | | tao} | OT CT CT CT CT 
6.88 [158.22 [Micronesia_—|_—*([5.88] 5.88] =| | | | OT UT 


References: 

Kivisild et al., 2003 
Cordaux et al ., 2004 
Zegura et al., 2004 
Hammer et al., 2006 
Sengupta et al ., 2006 
Thanseem et al ., 2006 
Li et al., 2008 

Malhhi et al ., 2008 
Zhong et al., 2010 
Haber et al., 2012 
ArunKumar, 2012 
ArunKumar et al., 2012 


=a 
=a 
| 
| 
| 
| 
| | 
|_| 
Les 2] 
== 
| 
= 
| | 
| | 
| | 
= 
= 
| | 
| 
= 
| | 
| | 
| 
| | 
fe = 
2 
= 
| | 
| 
| | 
| 
| 
| | 
=a 
7 
| | 
| 
| | 
= 
| | 
fe] 


Table-1b: Percentage frequencies of NRY HG F and its subclades with increasing 
longitude (West to East) 


Pasos [10am yume | |__| 9 
[2065 [tose schon [|_| 9.0 
Tass [10420 onin | |_| 028 
Po [ ue. Iinanes | [om ft 


References: 

Kivisild et al ., 2003 
Sengupta et al ., 2006 
Thanseem et al., 2006 

Li et al., 2008 

Thangaraj et al., 2011 
Zhong et al., 2011 
ArunKumar, 2012 
ArunKumar et al., 2012 


Figure 4a:Global Frequency distribution map of NRY HG C and its sub clades 


5 
od 
= 
o 
> 
i) 
= 


‘b 
S 
i) 


_F2 °Lahu 
~ ] @ Population 


Note: Each NRY HG sub clade is represented by different colour. The size of the pie is propor- 
tional to the frequency of the haplogroup. The colour keys for each haplogroup is shown in the 
box provided in the figure along with the age of the haplogroup obtained from various litera- 
ture sources listed in the table 1a-1g. 


lineages are found in India and had an insitu origin in India (Sengupta et al., 2006). 
The YSTR diversity is found to be highest in Austronesian populations. The proposed 
route of migration of M130 into Southeast Asia was via Indian coast and Australia 
during Palaeolithic ~SOKya. The age estimates (fig. 4a) also further support this 
migration route. These geographically specific haplogroups have undergone long term 
isolation (Zhong et al., 2010). 

NRY HG F: Fig 4b shows the frequency distribution of F lineages that are mainly 
present in East Asia. Paragroup F* is observed mainly in India. F*-M89 is known to 
have higher STR variance patterns in Tamil Nadu and Andhra Pradesh near coastal 
eastern India (Sengupta et al., 2006). The Dravidian speaking, Nilgiri Hill Tribe 
Foragers (HTF) populations of Tamil Nadu had long term STR _ evolution 
(ArunKumar et al., 2012). The TMRCA is estimated to be 29,344 Kya for F*-M89 in 
this study. In contrast, the F* populations of Orissa seemed to have limited STR 
evolution (ArunKumar, 2012). Northeast populations showed very low frequency of 
this HG. All these provide the evidences for autochthones origin and evolution of F*- 
M839 in Southern India. Whereas F2-M427 and M428 are found restricted to Lahu, an 
Sino-Tibetian language speakers of China (Sengupta et al., 2006). 

NRY HG Hi: The distribution of HG H and its subclades are mainly restricted to 
Indian (Fig 4c). HG H1 spatial frequency maps from the study of Sengupta et al., 
(2006) suggests the high STR variances towards Maharashtra region in coastal 
Western India and Y HG frequency in Eastern India. The study also revealed that Y 
HG H2-Apt show high STR variances towards Eastern coastal India and could have 
had insitu origin in India. A study by Trivedi et al., (2008) revealed the highest 
gradient is towards west India (44.4%). In Orissa populations, H-M69 and Hla-M82 


were present mainly in Dravidian speaking populations (ArunKumar, 2012). In Tamil 


24 


Table-1c: Percentage frequencies of NRY HG H and its subclades with increasing 
longitude (West to East) 


| Latitude |Longitude| Location | H_| H* | H1 | Hia|Hiai] H1b|H1b3)_H2 | 
| 41.15 | 20.17 |Albania [15.4] | | | CT 
45.94 | 2497 [Romania TA | 
| 35.13, | 33.43 |Cyprus_— [0.62] | | S| CT 
a De 


10.83 | 103.18 cr a 
38.47 | 106.26 [Ningxia | =| =| | 
22.82 | 10833 [Guangxi | | [1.22] 


References: 

Kivisild et al 2003 
Cordaux et al 2004 
Cinnioglu,et al.2004 
Sengupta et al., 2006 
Thanseem et al., 2006 
Thanseem et al., 2006 
Zallua et al., 2008 
Hammer 2009 

Zhong et al ., 2010 
Haber et al., 2012 
ArunKumar, 2012 


Table-1d: Percentage frequencies of NRY HG L and its subclades with increasing 
longitude (West to East) 


Location 
South Tyrolean Alps| 3 | | | | | 
Egypt 
Cyprus 
Palestine 

urke 

Lebanon 

Bekaa 

Syria 

West Caucassus 
Central Caucasus 
East Caucasus 
South Pakistan 
Afghanistan 
Tajikistan 

North Pakistan 
North India 
West India 
South India 
Central India 
Xinjiang 
Northeast India 
daman 
China 
Ningxia 


References: 

Wells et al., 2001 
Qamar et al., 2002 
Kivisild et al., 2003 
Cinnioglu,et al., 2004 
Sengupta et al., 2006 
Thanseem et al., 2006 
Zaulla et al., 2008 
Thomas et al., 2008 
Hammer et al., 2009 
Oleg et al., 2011 
Zhong et al., 2011 
Haber et al., 2012 


Ny 
SS) 


w So + ol }als 
uN N n ale: 
Nn Nn Nn RTD 


as 


eilongjiang 


Figure 4c:Global Frequency distribution map of NRY HG H and its sub clades 


YHG Age | 

| 30,400 
10,600) 
13,180 


* ™ « ot 2 the 


Note: Each NRY HG sub clade is represented by different colour. The size of the pie is propor- 
tional to the frequency of the haplogroup. The colour keys for each haplogroup is shown in the 
box provided in the figure along with the age of the haplogroup obtained from various litera- 
ture sources listed in the table 1a-1g. 


Nadu populations, Nilgiris Hill tribes speaking Kannada dialect (HTK) showed higher 
frequency (42.5%) and age (42.52 KYA) (ArunKumar et al., 2012). The YSTR based 
coalescence time by this study was ~43,556 years. Further study is essential to 
decipher the centre of origin of this haplogroup within India. 

HG H-M69 is also found in populations of Afghanistan such as Phastuns and 
Tajiks. The presence of this haplogroup in these populations is suggested to be due to 
gene flow from India to Afghanistan (especially H-M69, L-M20 and R2 M124) 
during Indus civilization or Bactria-Margiana archaeological complex as suggested by 
Haber et al., (2012). Roma gypsies in Western Europe have their founding lineages in 
India. They are the main source of Y HG H in Europe. Table 1c shows the haplogroup 
frequencies in various geographical areas. 
NRY HG L: Y HG L is found mainly in Indian subcontinent and Pakistan (Fig 4d). 
However it is present in lower frequencies in Middle East, Central Asia, Northern 
Africa and Mediterranean coast. The sub clades of L namely L1-M27/76, L2-M317 
and L3-M357 have distinct geography. NRY HG L1-M27/76 and L3-M357 are 
present in Indian and Pakistani populations respectively and nearly absent in Turkey 
and surrounding areas, suggesting a distinct founder in these regions (Thanseem et al., 
2006). The study by Sahoo et al., (2006) showed absence of NRY HG LI in east 
Indian populations, and associated it to be geographic rather than language. Whereas 
the study by Trivedi et al., (2008) based on YSTR showed that Dravidian speakers 
harboured higher proportion of NRY L as compared to Indo-European speaker. By 
increasing the phylogenetic resolution, Sengupta et al., (2006) reveals the early 
diversification of HG L1-M76 among Dravidian speakers during early Holocene 
(~9Kya). The STR variance of HG L1 in south India is higher compared to that of 


west India. HG L1 is nearly absent in northeast and Orissa (ArunKumar, 2012). 


25 


Further study on HG LI in Tamil Nadu populations showed the affinity of HG L1 
with dry land farmers (ArunKumar ef al., 2012). All these provide clues for insitu 
origin of HG L1 among Dravidian speaking populations that practised farming in 
south India. 

Considering the movement of pastoral groups via Turkey, Hindu Kush, 
Afghanistan and north India, HG L3 would be expected to be seen in south India. 
Contrary to this its frequency is only ~ 0.8% in south India. However, comparison of 
the six YSTR loci of four Chenchu tribe with Lambadis, Punjabis and Iranians 
showed considerable sharing (14-12-22-10-14-11). This haplotype differs from 
Armenia M20 chromosome by three step modal haplotype (15-12-23-10-13-11) 
(Weale et al., 2001). L2a is generally regarded as the “Mediterranean” as present 
mainly in Turkey. This haplogroup is also found in Parsi and Oran population of India 
(Genographic data) in lower frequencies. NRY HG L3-M357 is mainly localised in 
Pakistan populations. Interestingly a new subclade L3a-PK3 was identified in Kalash 
populations (23%) of Pakistan clustered with Yadava population of Tamil Nadu with 
TMRCA of 1400-8100 YBP (Mohyuddin et al., 2006a). Thus the study of HG L clade 
globally would provide more insight into the existing controversy of migration pattern 
of this clade. 

NRY HG O: The spread of NRY HG O2a-M95 has been associated with the spread 
of Austro Asiatic populations (Fig 4e). Kumar et al., (2007) suggested that the 
Mundari populations have been the source of O2a-M95 around 65,000 YBP. This 
view 1s supported by the absence of this clade in other parts of India. The HG 
frequency and YSTR variance was found to be higher in Mundari speakers as 
compared to southeast Asian populations. Whereas, the HG O3 lineages were 


concentrated among TB speakers. The study by Trivedi et al., (2008) concludes that 


26 


ZLOT ‘TeuNyuNIY 
TLOT “7 Ja 1OQeH 
1107 “72 12 3910 
8007 “72 12 TT 

LOOT “72 Ja UopkeyH 
9007 “[D ja woosuey |, 
9007 ‘70 Ja edn3ueg 
S00 “72 2aTYS 

7007 “70 Ja feresuey, 
sSIIUIAIJIY 


scsel|  0c'9¢ 
Ly ici] €c'1g| 
96'0c1| _OL'€e| 
coett|  6L'0"| 
6s'cII| Lrg | 
seo] 20°07 
cesol|  78'e7 


S 
Wa) 


~~ 
~”) 
[o,e) 
N 
— 
ai 
— 
s+ 
: S 
+ 
= 


a 

N 

ao] 

mo 
ie + 
er ~N. 
N N 


IL90r| _09°9¢| 
Oc'vOI| _98's¢| 
sor] e801] 
ILcol] __s0'sz| 


‘9€ 
‘TE 
“ET 
€ 
Cc 
Cc 
Cc 
€ 
I 
(4 
8 
I 
Cc 
(4 
Cc 
€ 
c 
I 
€ 
€ 
(4 
v 


+ 
: ae 
N 


io 
io 
oo 
re 
loa) 
bo | 
o 


ios 
= 
\o 
ri 


MIM [ovo 
‘ IAD] [25 
§) |alx|_| |g 
S| | 8| Ha} |e 
e|.S]S] S].S] e}S 
a/S] S| Ss] 3] s/o 
OJOL[OJO ala) Salas 


a 
= 
vay 
> 


vo'ee| 98 
co'eo] ZI 
Le'lo| _ €€'S7| 
clo] $96? | 


770 | PIPUT ISLOMVION 


\o 

oO tt = 
| |.20 oy ma 
So N | 
Oo 4 
S/ ele Oo N 
-|]S]} |S : 
Sl eal on 
— —_= 


oO 
o 
o 
S 
ie) 
N 


Nn 
ron 
; tS 
on| + 


| 
~ 
> 
\o 


Ogos] 08° 
00's8]__00'TE| 
99°8L| _L6'7e| 
9c'8L| 88°71] 
ssi] OT TE 


S 
S 
S 
ioe) 
ra) 


nN ve) 
a 
o 


3 

Zl |e 

=| | | 4 

S|), —)s 

8] Ss] 4le w|2 

1 el ed al md 

Oo] oO} a| 2 aie 
DA ZI Zl AlO]Z) ole Zz 


0] BIpuy yno 

BIpUy YIO 
ueysryeg UJO 
ueystyeg yno 
snseone,) ise 


woneI0'T 


ri] _oLre 
soo] 9F RC 
OcLe|—_ss"1r 
spmntuor PpRNUrT 


(jsBqy 0} JSIAA) IPNITSUOT SUISLIAIUT Y}IM SOPLIQNS s}I pue GO OHH AYN JO So—uonbo.ay 99¥jU9919g :9T-I[qV 


+ 
w 
NC 

olen \o 

ye ae 

mIN S 


olo 
a 
a|/4 


So 
+ 
o 


7 
= 
N 
a \o 
o a 
N 
} 


= 
Ee ee | 
| 
| 
aa 
| ES6E | 
| 86°01 | 
PPTL | 
p= 
fs 
| 60°6 | 
| 
| 
== 
is 
| 
= 
| 
| 
| 
| 
| 
a 
eseeO 


= loo) 
a ioe) 

S 
eo) 

(oe) 
e D} 
= Ne) 
oO = 


0 

oo 
roa) 
fe) 


qeeeO| EBEO| TREO] TBEO| FEO 


w 
a 
fe) 


Vay vay 
N 
g} [8] |S 
a = Ya) 
Hud eeeeee 


fea) ie,e) 
al co 
en 
2) 


Table-1f: Percentage frequencies of NRY HG R and its subclades with increasing longitude (West to East) 


| 7.09 |Moroceo _—fos2| | ano | | | 
a 
5.29 Netherlands 25.00 
| saso | 954 [risa S| | fos | ora | aes | om | 29 | 
33gi | 108s [eda Ss | | | | Ct ts | TT 
ee a 
41.87 12.57 __|Ital 6.21 
| sas3_| 1526 [WestEurpe | | | | | TTT tc oo 
| aris | 1627 [Austrohungary | | | | ass | TT 
ee 
51.92 19.15 |Poland 6.45 
| aiis | 2017 [atania | | | Tass | TT 
| aicr | 2175 [Macedonia | =| | | fof TT CT 
peepee 
56.88 24.60 __|Latvia 100.00 
| 2497 |Romania | | | | soo | | 
| 3533 | 2513 [crete | | S| | tof | 
epee ee 
30.05 31.25 [Egypt y 1.63 3.25 
Fe a a 
| sas | 3412 [amie S| | | | too 
oe 
31.95 35.23 _ [Palestine 0.34 2.01 1.01 0.34 
| 3896 | 3524 [rurke | tas | 
| 3380 | 3550 [Beit Sf 3.33] | gaa || or | 
35.86 _|Lebanon 105 6.24 122 | fa 
34.02 36.18 _|Bekaa 1.41 7.04 2.82 
| 1533 | 3895 [centratsoner | || is | | | 5.59 | 
| saso | 3900 [syria S| | aos | | sv | oso | 


43.80 41.75 _|West Caucassus 12.67 1.00 
43.12 43.55 |Central Caucasus 0.56 0.56 


Fava [4640 amenia | | fo | | 
| 3596 | 4714 |kurdistn | | | | tn | 


40.14 47.58 _|Azerbaijan 
41.55 47.70 _|East Caucasus 5.70 


| 3243 | 53.69 |irn S| | Sf po | fz | | 

| 2836 [66.53 |south Pakistan | | | aso | sis7| | 

| 33.04 | 67.71 [Afghanistan foas| | | aso | | 

| 3ss6_[ 7128 |rajikistm | | ass || 

| 3479 | 7141 North Pakistan [oss] | |_| 
19.75 75.71 _|West India 

| 2297 | 78.66 |centrandia | | | | fsn ft | | 


22.80 86.30 _|East India 0.08 | 0.31 8.01 
43.79 87.63 _| Xinjiang 29.70 
| 2965 | 9112 |xizmg | | | Tis | 
25.33 91.27 __|Northeast India 0.22 8.91 

61.01 99.20 | West Eurasia 11.76 


Qinghai 
Yunnan 


36.06 js_fons 
30.65 104.08 {Sichuan 


38.47 06.26 [Ningxia 
26.60 106.71 _|Guihzou 


20.82 | 108.33 _Guansu 


Gunsu | 
ee 
40.82 111.77 {Inner Mongolia 


29.79 118.26 {Anhui 

| 3206 | 11876 |riangu | | | | tooo | | 

| aisa [123.43 [Liaoning | | | Tf se | | 

<2 eae ee 
36.20 138.25 |Japan 3.03 3.03 

References: 

Wells et al., 2001 

Qamar et al., 2002 

Kivisild et al., 2003 

Cinnioglu et al., 2004 

Cordaux et al., 2004 

Bosch et al., 2006 

Sengupta et al., 2006 

Thanseem et al., 2006 

Hammer et al., 2006 

Capelli et al., 2007 

Martinez et al., 2007 

Gayden et al., 2007 

Lietal., 2008 

Zalloua et al., 2008 

Hammer, 2009 

Gaieski et al., 2011 

Thangaraj 2011 

Zhong et al., 2011 

Oleg et al., 2011 

ArunKumar et al., 2012 

Haber et al., 2012 


Figure 4e:Global Frequency distribution map of NRY HG O and its sub clades 


@ 03a1,03a3b 
O3a3c 


SB feet 


Figure 4f:Global Frequency distribution map of NRY HG R and its sub clades 


_—_—- =< 


Note: Each NRY HG sub clade is represented by different colour. The size of the pie is propor- 
tional to the frequency of the haplogroup. The colour keys for each haplogroup is shown in the 
box provided in the figure along with the age of the haplogroup obtained from various litera- 
ture sources listed in the table 1a-1g. 


the NRY HG O lineages in India had a Southeast Asian origin. These lineages arrived 
at different times, as no HG O3e lineages were found in AA speakers. Other study by 
from Genographic India-China study gave evidences for origin of AA speakers from 
Laos with the TMRCA of 64.2 Kya. This migration was mainly male mediated. 


Distribution of NRY HG _R: NRY HG R and its sub clades are geographically 


widespread. HG R-M173 is considered as the ancient marker that arose first in Homo 
sapiens sapiens in Eurasia (Al-Zahery et al., 2003). Northern Camaroon in Africa is 
the one of the population from Africa which represents R-M173 mutation in a 
frequency of ~40%. The origin and spread of the sub- clades, especially NRY HG 
Rlala-M17 gave rise to different school of thoughts. Study by (Wells et al., 2001) 
suggested that NRY HG M17 lineages and their YSTR diversity is found to be highest 
in Central Asia (South Russia/Ukraine) and could be the probable origin of this 
marker. Another study by Sharma et al., (2009) showed that the age of Rlala-M17 in 
Indian populations were much older than that of Central Asian populations, thus 
supporting the Indian origin of Rlala-M17. There study suggests that high frequency 
of Rlala-M17 in Brahmin populations could be the founder of this haplogroup 
irrespective of their linguistic and geographical affiliation. This further supports the 
formation of caste system in India. The study from Genographic India showed that 
NRY HG Rlala-M17 was significantly high in Indo-European populations, 
irrespective of their geography. It suggested that this clade could have the ancestors of 
HG Rlala-M17 from India accounting to low effective population size, high YSTR 
variance, high mean pair wise difference as compared to the other global populations. 
NRY HG R2 -124 is present in higher frequency in India and deeper age estimates 


suggests Indian origin during late Pleistocene (Cordaux, 2004: Trivedi et al., 2008). 


27 


2.6.3. Major Y haplogroups of Middle East and Caucasus: 


NRY HG G: This clade is mostly present in Middle East, Mediterranean and Caucasus 
region. Semino et al., (2000) suggests that the haplogroups J and G had a common 
ancestry. Populations speaking northwest Caucasian languages show high frequency 
of NRY HG G-M201(Nasidze et al., 2003). The lineages of G are known to correlate 
with the archaeological areas of Bronze age Hattic and Kaska cultures (Cinnioglu et 
al., 2004a). Rootsi et al., (2012) studied 16 informative G clades in the populations of 
Europe to Pakistan and associated NRY HG G and NRY HG J2 to the spread of 
agriculture in Europe. The study also proposes that the homeland of this NRY HG 
could be Anatolia, Armenia or Western Iran. 

Y AG J: Quintana-Murci et al.,(2001) suggested that NRY 12f2a spread to India 
during Neolithic period with farming technology thereby indicating entry of Indo- 
Aryan migration into India through the Western corridor. There study showed that the 
microsatellite variation of J-M172 is higher (0.947) when compared to J (xM172) 
(0.844). Hence J-M172 is an older marker in comparison to J (kKM172). There study 
hypothesises that J-M172 could have expanded in the Northwest of Fertile Crescent 
and spread along with agriculture. Whereas J (xM172) must have its centre on the 
Eastern side of Fertile Crescent and expanded into Arab populations. J1-M267 is 
present in the frequency of 9% in Turkey (Cinnioglu et al., 2004a) with short DYS388 
allele with 13 repeat units. The study by Sengupta et al., (2006) suggest the eastward 
expansion of NRY HG J2a-M410 with agriculture and painted pottery into Indus 
valley during Neolithic period. The Y STR based age estimates of Y HG J2a-M410 


and J2b2-M241 exceeds the age of agriculture in India i.e., 6 Kya. 


28 


Table-1g: Percentage frequencies of NRY HG J and its subclades with increasing longitude (West to East) 


ee ee 
ee ee 


joi. boss [oomacuonw [om | | om | | | | | |_| 
am foe ma | tem? | | | | || | |_ 
ss fr fewcwans [im] | [zal | | | | Jen] __| 
nia fm | —pawf | | | | || | |_| 
pase feos _[soumratinan | [ozo] [am] | | | | | _| 220 
fo form _[araaninon [| [966 | [ome{ | [2o| | | _| 0a | 
par [ai fromrawen | 2m? | | | | | | | [am 
pits isos [Nowbings | [sss | | | | | | | |_| sa | 
hers ism _|wentnoa [255 Jose} | | | fom] | | usr 
pase [reas [souminga [773 [ 055 [aoe] | _[omfom| | | aoe | om | 
rms fox fewwmie | fom] | | | | || _} vor 
jam fra [ining | fof fox | [| | va | on | 
boss fm frome | | | | [| | | | [sal |__| 
ps _fpizr _fromewwmims | fom] | | | | | | [en] __| 
suor p20 [wets | | | | | || | [| J2me| | 
pss fiom [ones | [wat | | | | || | |_| 
pas hoses low | [em] | | | | | | | | 


pses foam fom | fost | | | | | | | |__| 
Ee 
—_ 
ie 


paa7 [10626 [Ningxia ff tert of | TT 
p427 [ios [shams | Tt oo | | CT CE CT 


References 

Oleg et al., 2011 
Zaulla et al., 2008 
Capelli,et al.2007 
Michael Hammer 2009 
Thanseem et al 2006 
Kivisild etal 2003 
Genographic public participation 
Cinnioglu,et al.2004 
Thangaraj et al., 2011 
Haber et al., 2012 
Sengupta et al., 2006 
Zhong et al., 2011 
King, et al., 2007 


Gaieski et al., 2011 


Figure 4g:Global Frequency distribution map of NRY HG J2a and its sub clades 


Note: Each NRY HG sub clade is represented by different colour. The size of the pie is propor- 
tional to the frequency of the haplogroup. The colour keys for each haplogroup is shown in the 
box provided in the figure along with the age of the haplogroup obtained from various litera- 
ture sources listed in the table 1a-1g. 


2.7 Previous studies form this laboratory: 


This laboratory has been actively involved in the study of Human Leucocyte 
Polymorphism (HLA) for the past thirty years. The study designs are based on the 
complex inbreeding units that exist in India. HLA DRB1* and DQB1* has been found 
to be specific in Piramalai Kallar and Yadava populations respectively of Madurai 
(Shanmugalakshmi et al., 2003). This study also suggests that endogamous units, 
sympatrically isolated castes or well defined breeding isolates that live under the same 
mileu-epidemiology, may be ideal models to test immunogenetic basis of disease. 
Pitchappan, (1998) brought out the differences in the distribution of HLA haplotypes 
in Indian and Caucasian populations. 

The leprosy-affected sib-pair studies by whole genome microsatellite mapping 
identified the susceptibility loci at 10p13 (Siddiqui et al., 2001; Tosh et al., 2002). 
This has been mapped to the disease in C20 families from Tamil Nadu, but absent in 
neighbouring state- Andhra Pradesh, thus reflecting the importance of community 
genetics in genomic era. All these studies indicate that for any case control study, the 
controls have to be matched with age, sex and caste for appropriate comparison 
(Pitchappan, 2002). The study on Eurasian populations gave evidences for the first 
coastal human migration from Africa to Australia via the Indian subcontinent (Wells 
et al., 2001). 

NRY studies on 31 populations of Tamil Nadu, suggested that both caste and 
tribal populations had overwhelming frequencies of H-M69,F-M89, Rlala-M17, L1- 
M27, R2-M124 and C-M130. These lineages date back to late Pleistocene in these 
populations. The West Eurasian contribution has been <20% in Y lineages. A strong 


genetic structure has been identified to be associated with mode of subsistence of the 


29 


study populations. The social stratification was found to be established by 4-6Kya, 
predating that establishment of Varna system. 

Another study on the populations of Orissa and North east India showed a 
strong correlation between NRY and language. NRY HGs such as F*, H, Hla were 
more predominant in Dravidian speaking populations of Orissa. Whereas, the Austro 
Asiatic speakers possessing HG O2a-M95 migrated through northeast corridor from 
Laos. Laos could have been the probable geographical origin of HG O2a (~64.2Kya) 
(ArunKumar, 2012). These migrations were male mediated and no mtDNA genetic 
resemblances were found in India. These migrations may be coupled with the practise 
of shift cultivation. The NRY (HG Rlala) and mtDNA (M*, M6 and U*) 


composition of Indo European speakers suggest their autochthonus origin in India. 


30 


MATERIALS AND 
METHODS 


3. MATERIALS AND METHODS 


3.1 Sampling: 


A total number of 2,522 healthy male volunteers belonging to 33 castes and 18 
tribal populations from Andhra Pradesh (N=774) (latitude: 17.047762 longitude 
80.098187), Karnataka (N=877) (latitude: 15.317277 longitude: 75.713888), 
Maharashtra (N=458) (latitude: 19.751480, longitude: 75.713888) and Gujarat 
(N=413) (latitude: 22.258652, longitude: 71.192380) were enrolled and sampled 
either in their household or in public places. In addition, 170 samples of Nattukottai 
Chettiars from Chettinad of Tamil Nadu (latitude:11.127123, longitude:78.656894) 
India, were collected in two of their community congregations. The ethnographic 
details of these populations are given in appendix 1.The volunteers were all above the 
age of 18 and written informed consent (Appendix 2) was obtained, witnessed by 
local interpreter/community leader. The choice of the populations to be sampled was 
based on the advice of the anthropologists and genetists. The study populations were 
selected based on their uniqueness, antiquity and population size. The list of all the 
advisors and collaborators who assisted the Genographic team and the work load 
shared by the members of the laboratory are given in Appendix 3. The sampled 
locations as co-ordinates, caste / tribe names and N collected are shown in Fig 5. 

The current study is a part of The Genogrpahic Project-India. Ethical clearance 
for the study protocols were obtained from Madurai Kamaraj University, Madurai. 
Necessary permissions from local government bodies, village heads and educational 
institutions were obtained before sampling. The volunteers for sampling were 
approached through a local contact. The purpose and methodology of sampling was 
explained to the volunteers, head of the institution or village head in their local 


dialect. On their approval, questionnaires were filled by the volunteers (Appendix 4, 


31 


Yysopelg eiypuy “y eyeyeuley “¢ 
eryysereyeyy “7 yerefny *] 


suonendod Apnjs snore JO SUOT}ED0] BIC UES 94} SUIMOYS BIPU] JO dey :c 1NnsIy 


5). For the samples collected at each location, sampling team document and village 
document were filled (Appendix 6,7). The demographic details of the studied 
populations are given in table 2. 


3.2 Topography of Sampled locations 


3.2.1 Gujarat 


Gujarat is the north western most state of India, over the Arabian Sea (Fig 1). 
It covers a land mass of 1,96,030 Km”. It borders with Pakistan, and Rajasthan to the 
North, Madhya Pradesh and Maharashtra to the East, Arabian Sea to the South and 
West. The land mass of Gujarat is divided in to three regions: Peninsular Saurashtra, 
Kutch and Gujarat corridor. Sir Creek (96km strip of water) demarcates Pakistan from 
Kutch of Gujarat. The Gulf of Kutch divides the Kutch region from Sourashtra, and 
the Gulf of Khambat separates the Sourashtran region from the southern corridor of 
Gujarat. River Narmada forms one of the traditional barriers between North and South 
India. It drains into the Gulf of Khambat. This river basin covers 14% of land in the 
state of Gujarat. The study populations have been sampled spanning the entire area of 
Gujarat. 


3.2.2 Maharashtra 


Like Gujarat, this was also carved out as a linguistic state in 1960 at the time 
of independence (Agrawal and Agarwal, 1995). Maharashtra lies in the mid-western 
part of India. It is surrounded by the Arabian Sea in the west, Gujarat, Dadar and 
Nagar havelli in the north, Madhya Pradesh in the northeast, Chattisgarh in the east, 
Andhra Pradesh in the southeast, Karnataka in the south and Goa in the southwest. 
The state covers 307,731Km” in area and contains two reliefs: the Deccan tableland 
and the Konkan coastal strip. Sahyadri hills are the backbone of Maharashtra which 


separates the two reliefs. Tribal populations such as Katkari, Warli, Kokni and caste 


a2 


Table 2: Demographic table for various study populations 


Census size 
(year) 


Geographical 
Region 


idai 

E” {Hunters asava 
IE |Agriculture South Narmada Kathodia 
tribe} IE i Kotwalia 
tibe} IE i outh Narmada Ratwa 
IE Koli 

JE ourashtra Maldhari 
IE North Narmada Patel 

IE North Narmada 
IE Kutch Brahmin Kutchi 

IE Sourashtra Brahmin Sompuri 

IE i Kutch Gatvi NA 


IE North Narmada 620009 (1901) 


12. 2000000 (approx) 
Sahyadri___——[Parsee_ | Par_| 86 | 90,000 (approx), 


v 


Sp Subsistence Population 


= 
co 
as 
eG, 

- 

F Social 
S| status 
few 


AF! |Agriculture 


Bale o 
Dlwe] Bs 
Ble = 
a [a2 

S 

iy 
a | oo 


= 
| 


ain 


= 


N 
Nn 
oo 
oo 
ron 
— 
a 
w 
p 


N 
f=) 
x 
XQ 
Nv 
vn 
~< 


Rajput 


= 
i] 
mal 
i] 
= 
i) =<) 
n n n 
a eae 
oa o;o 


Priesthood 


Gujarat 
QTSIS[SIYS[QISJye Jy 
SINJOINTE ISI N]eale 
walololwlpAlolElala 
SOlo/RIHIS/ela/ARIBR 
ALATA ITNITAYAITYAINININ 
Lae bs t SIOLRIL SOLS le ye rey: 
BILD] NO] eR | | | | & | WH | WD 
SIlBlA}oalml|[alolara 
Q 1a QlaQ S/a}/elela 
.|2 [2 2 [se ]es |e [ole ia] 
41a 424/414 a s 
oOo |o ojojala;]oelo oa 
— lal Fane fond fone fon! {=| 
om m jm om 


a 
.|2 
n 
+ 
oO 


o 
Ke 
5 
a 
: 
@ 
a 
Qa 
fe) 
a 
= 
p 
5 
be) 


== TTI le 
SYS DD IS [D Is 
QI JQ Ja Ja fa 
AlAlS [o/s 1S 
SS 
CAI [Aa rasa (eal ead (Foal 
IY [&¥ Ja [ey [a 
— i [Bo Ito [Ye Il 

eS 
.| a a} 

s 

o 


= 
io” 
Ss) 
Ke 
5 
a 
: 
@ 
a 
a 
fe) 
& 
= 
p 
5 
be) 


bb 
o-) 


Sahyadri 


is) 
Ka 
5 
Q 
=, 
© 
Z 
@ 
7) 
ie 
go 
zy 


n 
— 
ay 


Maharashtra tota 


= 


riesthood 


Maharastra 

= ]—-]— 

ag 

in lo lu 

Ny IS [ye 

eo 

Wild [Ww 

oo |0Oo {00 

fo} 

a 

oO 

ies) les) 
<=|F |F Zio e & = |F IF 12 ZIP IT(TIFISIF Ze [2 
2 lo 5 = = £& 14 Io {ks PPA lola (RIS (BAIS [ec 
+ 18 =< Da D Ais fla ja dis jalel@ig|eljelela 
pS |= @ lo 2 © 21 14 |e BIS |Z/2)e)2-/8 |e le /s 
B |g a lA 8 8 2 |E |S |# Peele | rl ie” 
|" a a |= |= a 


5 
TQ 
= 
2 
= 
5 
> 
a 
c 


x 
a 
= 
i=} 
a 
3 
5 
uc} 
Zz 
p 
2 
2 


© Karvali 


Fishing 
€ 


S' 


: 
Tribe i 
| SDR [Pastoral 
Lo 


= |e 
@ is 
a he 
Bolo 
els 
S15 
Bg 
3 

ae |e 
Sls 
a |S 
Qa |o 
BIS 
pS | 
aja 
ec jc 


z 
a 
> 
i=} 
a 
3 
5 
us] 
Zz 
p 
a 
2 


| 
as 
oo 
ISS 


~~ 
aS 
oo 
aSS 
+ 
oO 


= 
g 
= 
5 
a 
Sy 
: 
5 
ra 
aw 
bob) 
a 
gab) 
= 


: Karvali 


Karnataka 
= JT lll ll lll Te 
NW IN [IN Te [Ny IN IN IN IN IN [NIN 
W [00 [Co J\o [Sm [Ww [HR [oo joo JW joo [TN 
® IQ IQ IQ IK 1S |B QIK IQ [a 
airy in [nN sa IN JY Jr 
DID IDA IAI IR IYI 1D 
ol Ina JR ]o [oo |oo joo [TN 
NA Tw im Joys [hl [HR [A 
Q1la};]sli8lialasasa 
es |e |o.)o.]e |p i 
n 1H De n n 
a 42 |2 a 
oO oO lo oO 
n 


oe ew | 
NK 
nar 
Bln 


Karnataka total 


n 
6 | 
8 | 
9 | 
|| 
| 
|| 


Fa 
5 


South Bayaluseeme |Iyengar 


z 
a 
a 
5 
a 
a 
5 
tc) 
n 
} 
ES 
ee] 
2 
K 
=a 
= 
n 
© 
@ 
3 
@ 
Qa 
|Z 
SB lo. 
g |» 
be) 
5. 
eS 
p 
S 
Wis} 
eo 
oo 


é 
g 
> 
5 
a 
a 
5 
ti} 
n 
fe) 
ES 
ow 
fav) 
Ke 
pat) 
=I 
n 
fa) 
io) 
3 
to) 
> 
a 


South Bayaluseeme |Kuruba 


) Sahyactr 
Dry land farmers |Sahyadri 2,12,836 (1961) 
Sahyadr 


Keatkari 
Watl 
Kola 
Kor 
Iyn N 


A 
300,000 (approx) 
NA 
A 


Kolam 
Korku 


N 


Brahmin 
Goudsaraswath 


Nn 


Kodava 


Bunts 


w IR |B 


338495 
fss[ oN 
|si{ NA 
i 21,99,170 (2001 
J 3,623 (1961) 
Ye | 64 | NA 
Bha‘ 1,200,000 
NA 
6,382 (1961) 


Kur 9,246 (1961) 


Mogaveera 


co 
pay 


Jenukuruba 
Yerava 

Brahmin Havyaka 
Billava 


Koraga 


B |e g |Z 19 
cg |= Se mlm Ie 
Nn 
BR 


Census size 
(year) 


82 |95,00,000 (2001) 
6] 37.37,609_| 
] 


Geographical 
Region 


lon 
~p 

+ 
+ 
= 
+ 


— 


co [| 00 [oo [oo Joo = 
piece (eal (smell ecard aa) 
Vln lie lw 5 
NTaA]o [Hm lwo de 
i. 

- 


SDR 
Ti 6 


N]Tw 
olo 


oo [oo | o0 
Ne t 
Olu 
Beloays 


a 
n 
oO 

ue} 
& 

3 

z 

E 


aste Brahmin ANV* 


: ; A 
85 [82.02 
Andhra Pradesh total ee! Dain (7, ae 


Total N studied 2522 


Oo 


== 
nIrX 
o};}n 
nAlo 


1 AF: Afroasiatic speaker 

2 IE: Indo European speaker 

2 CDR: Central Dravidian speaker 

4: SDR: South Dravidian speaker 

NA: Not Available 

approx: approximate estimate from Wikipedia (2012) 
# BANV : Brahmin Andhra-Vaidiki and Neiyogi 


populations such as Deshastha Brahmin, Chitpavan Brahmin, Dhangar, Maratha and 
Parsee are the major populations that inhabit this region. Korku and Kolam tribes are 
found in Gavilgad ranges of Satpura hills. Gonds are found in Gondwan region, a hill 
which extends from Vidarbha region of Maharashtra, to the west of Chhattisgarh 
through North of Madhya Pradesh. The present study includes all the populations that 
inhabit these regions. 


3.2.3 Karnataka 


Karnataka is a south west state of India. It is bordered by Arabian sea to the 
west, Maharashtra to the North, Andhra Pradesh to the east, Tamil Nadu to the 
southeast and Kerala to the southwest. The area covered by the state is 191, 976 Km’. 
It has been the homeland to Kannadigas, Kodava, Tuluvas and Konkani speakers. 
Geographically, it has three principal regions: coastal Karavali (Dakshina Kannada, 
Udupi districts), hilly Malenadu covering Eastern and Western Sahyadri ranges 
(Uttara Kannada, Shimoga, Chikkamangaluru, Kogagu, and Hassan districts) and 
plains of Deccan plateau called Bayaluseeme (North Bayaluseeme includes regions 
Belgaum, Gulbarga, Bidar, Dharwad, Chitradurga and Raichur districts; Southern 
Bayaluseeme includes Bangalore, Mysore, Kolar and Mandya districts). The sampled 
populations from Karnataka included all the above mentioned language speakers from 
Karavali, Malenadu and South Bayaluseeme regions. 


3.2.4 Andhra Pradesh 


Geographically, Andhra Pradesh lies to the southeast coast of India. It is 
bordered by Maharashtra, Chhattisgarh and Orissa to the North, Tamil Nadu to the 
south and Karnataka to the west. To the east is the Bay of Bengal. It occupies an area 
of 2,172,000 Km”. It has three regions. They are, northern plateau region called 


Telangana, Southern part, the Rayalseema and Coastal Andhra. Telangana and 


33 


Rayalseema are divided by the river Krishna. Coastal Andhra includes the districts 
between the Eastern Ghats and Bay of Bengal from the north of Orissa to south of 
Krishna delta. These districts include Srikakulam, Vizianagaram, Vishakapatnam, 
East Godavari, West Godavari, Krishna, Guntur, Prakasam and Nellore. This study 
mainly focused on the populations from Coastal Andhra Pradesh and Godavari 


districts. 


3.3 Sample collection and DNA extraction: 

30ml of plain commercial bottled water (Aqua) was used for collecting mouth 
wash sample. This method is user friendly and non invasive. Large number of 
samples could be collected in a reasonably less time. In short, 30ml of aqua was given 
to each volunteer in a plastic cup. The cups, questionnaire and informed consent of 
the volunteers were given unique identifiers. The volunteer was asked to swish the 
water in his mouth for one minute and spit the contents into a plastic cup. 50ul of 
30% sodium azide (P/N 0191/3391/06013, S.D. Fine Chem Ltd) was added as a 
preservative to the mouth wash collected, to prevent any further growth of microflora. 
The sample was rested for some time for the food particles to settle and then decanted 
into a 50ml tube (Cat Np.227261, Greiner Bio One). The samples were transported 
and the initial step of cell isolation was performed in the makeshift camps. 

The samples were centrifuged at 2500 rpm for 10 min to settle the buccal 
cells. The supernatant was discarded and to the pellet, ml of White Cell Lysis Buffer 
(WCLB) was added (Appendix 8). This was transferred to a 1.5 ml micro centrifuge 
tube (P/N: 616201/ Griener Bio-One) and couriered to the parent laboratory at 
Madurai Kamaraj University, Madurai. The samples were couriered every three or 
four days to the laboratory: this was to avoid any damage caused by long term 


storage. 


34 


In the laboratory, further steps of DNA extraction were carried out by other 
research fellows and technicians. Salting out method (Ausubel, 2002) was employed 
to extract DNA. The samples received were transferred to a 15ml tube (P/N: 
188271/Griener Bio-One) and 1 more ml of lysis buffer was added to make up the 
volume to 2 ml. The samples were incubated at 42°C for two hours in slanting 
position. After the incubation, | ml of 6 M NaCl (SRL 828947, recrystallized) was 
added to the samples, vortexed for 10 seconds and placed on crushed ice for 10 
minutes. The samples were then centrifuged at 4000 rpm for 10 minutes. The 
supernatant was transferred to another fresh 15ml tube. To this, equal volume of 
100% ice cold ethanol was added. The samples were mixed gently by rolling and 
inversion. Precipitated DNA was visible at the interface in most of the samples. But in 
some cases the precipitate was not visible to the naked eye. The precipitate was 
centrifuged at 4000 rpm for 10 minutes and the supernatant was discarded leaving the 
pellet in the tube. To the pellet 1ml of 70% ethanol was added and transferred to 
1.5ml micro centrifuge tube. The samples were then centrifuged at 8000 rpm for 3 
minutes. The 70% alcohol wash step was repeated twice and finally the DNA 
obtained was air dried and suspended in 150ul of Tris- EDTA buffer (Appendix 8) 
and incubated at 42°C overnight. The DNA was stored at -20°C. 


3.4 GENOTYPING: 


3.4.1 DNA dilutions: 

As a first step for the preparation for genotyping, DNA was diluted 10 times 
(first dilution stock of 1001) in 10mM: Tris - EDTA 0.1mM buffer, in 96-well flat 
bottom dilution trays (P/N 655201/ Griener Bio-one). From this 21 was used for 
quantification of DNA by Quantifiler assay. The data obtained from this assay was 


used to prepare subsequent assay-specific templates for dotting (YSNP assay) and 


a5 


Multiplex assays. All the PCRs were performed as per manufacturer’s 
recommendations. 
3.4.2 DNA Quantification: 

The DNA thus obtained was estimated by using Quantifiler kit (P/N: 4343895 
Applied Biosystems, ABI) in ABI 7900HT Fast Real time PCR system (S/N 
279000947, ABI). This assay targets the human telomerase reverse transcriptase gene 
and hence the bacterial and other DNA were not considered. The Real Time PCR 
employed TAQMAN chemistry: the probes specific to this gene was labelled to FAM 
dye, whereas IPC (Internal PCR control) was labelled with VIC dye. The IPC is a 
synthetic sequence that is present in the Quantifiler PCR mix. It is amplified with 
each sample during PCR and this helps in detecting PCR failures and the inhibitors. 

For Quantifier assay 25u1 of PCR mix was added to each well of 384-optical 
plate (P/N: 4309849, ABI) and 21 of respective DNA, was added to it and sealed 
with optical sealer (P/N: 4311971, ABI). Eight standards were tested along with the 
test samples. Assay was carried out as per the manufacturer’s protocol. The results of 
the PCR were analysed using Sequence Detection System v2.3 (SDS) software, ABI. 
The amount of human DNA present was measured by Absolute Quantification 
method (Fig 6). 


3.4.3 NRY - SNP Genotyping: 


The samples were genotyped for Y-SNPs using Taqman Chemistry, referred 
as 5’ nuclease assay. Here, the biallelic states of YSNPs were detected in a Real Time 
PCR assay, using probes specific for each allelic state (ancestral and derived). The 
probes specific to the ancestral state were tagged with VIC dye (green) whereas the 
probes specific to the derived state was tagged to FAM dye (blue) in most cases. 


These reporter dye molecules were attached to a quencher molecule. When there is no 


36 


Figure - 6: Absolute Quantification : PCR Amplification Plot 
Amplification Plot 


1.000 Eel Se mS eG an 


lification Plot 


tee 


1.000 E1 


1,000 E-2— = 


1.000 63-1 


1,000 64-4—J}) 


0 ‘ 10 15 20 25 30 35 40 


Cycle 


Detector Quentitlr Hurnan | Plat Rn vs. Cycle | Thresholt| 0.066903534 


Note:The X axis shows the PCR cycle number and the Y axis shows the fluorescence. 
Samples with higher quantity of DNA shows amplification in the early cycles of PCR. 


hybridization between the probe and the template, no fluorescence will be emitted, 
due to the proximity of the quencher molecule to the reporter dye (Fluorescence 
Resonance Energy Transfer). Upon hybridization of the probe with the template 
DNA, the quencher molecule is distanced due to 5’ nuclease activity of the Taq 
polymerase used. The fluorescence thus emitted is captured to detect the derived or 
ancestral state of the YSNPs in the same well. 

All the 2,522 samples were studied for a total of 52 YSNPs, by custom made 
probes obtained from Applied Biosystems, Foster City, USA, specifically for The 
Genographic. Firstly, 1 ul of 10ng/yl of the DNA was dotted onto a 384 well optical 
trays. 96 samples were dotted in four quadrants of the 384 well plates and hence four 
different SNP probes could be tested in a single PCR run. The DNA dotted plates 
were allowed to air dry before the PCR setup. The PCR reaction was set up as per 
manufacturer’s protocol (“Allelic discrimination.assay.ABI.online protocol,” 2012) 
To the pre-dotted trays, Taqman genotyping master mix (P/N: 4326614, ABI) and 
custom made probe/primer mix were added to make the volume to 5pl. The 
fluorescence, before and after the PCR was measured. The specific alleles detected 


were visualised by allelic discrimination plot in Sequence Detection System (SDS) 


v2.0 software (Fig 7). 
Stage Temperature Time Cycle 
I 95 10min 1 
I 95 15sec 
60 Imin30sec 50 


The SDS software output was fed into Autocaller software v2.3, to assign the 
ancestral and derived state. The haplogroup assignment programme developed by 


IBM group of The Genographic project assigned the haplogroups based on the results 


a7 


Figure-7: Allelic discrimination plot of NRY SNP genotyping 


Marker: {304 | Calt |ndetermin... 


Allelic Discrimination Plot 


Allele Y (FAM) 


03 08 13 18 23 28 33 


Allele X (VIC) 
Note:Blue dots (FAM) represent the presence of NRY HG(M304, here) and red dots(VIC) 
indicates the absence of this SNP. No Template Control (NTC) and Female Control (FC) 
show low fluoroscence. 


of Autocaller using the Y-Chromosomal phylogenetic tree 2008 (Karafet et al., 2008). 
These were verified manually as well using the HG assignment pattern based on 
YSNP hierarchy (Appendix 9). 


3.4.4 YSTR Genotyping: 


A set of 17 YSTRs (Appendix 10) were genotyped using AmpF/STR YFiler 
PCR amplification Kit (P/N: 4359513, ABI). Multiplex PCR assay was set up in a 96 
well Micro Amp reaction plate (P/N N801-0560, ABI) as per manufacturer’s protocol. 
In short, 10ul of 0.2 ng/yl of the DNA was used for PCR and amplified in Gene Amp 
9700 thermal cycler (“YFiler.ABI.Online.Protocol,” 2012) as per manufacturer’s 
protocol. The cycling conditions used for this assay is given below: 

Temperature profile for YSTR and Multiplex 2 assay: 


Stage Temperature Time Cycle 


I 95°C llmin 1 

II 94°C 1 min 
61°C Imin 30 
TPC 1 min 

Il 60°C 80min 1 
4°C 00 


After the PCR, the samples were subjected to fragment analysis assay. In this 
assay, 0.5u1 of the PCR product was added to 9u1 of Hi Di formamide (P/N: 4311320, 
ABI) and GeneScan LIZ-500 internal size standard (P/N: 4322682, ABI) in a fresh 96 
well plate. GeneScan LIZ-500 is present in all the samples and is tagged with the 
ROX dye. Allelic ladders, the pre amplified PCR product of the 17 alleles, were also 
included in the fragment analysis assay as reference standards. The samples were 


electrophoresed in a 3130x/ genetic analyser (S/N 18233 022, ABI) with 50 cm 


38 


capillary loaded with Performance Optimised Polymer-POP7 (P/N: 4352759, ABI). 
The electrophoretic run separated the alleles based on their PCR product length. In 
case of alleles with same size, they had been labelled with different dyes thus 
allowing enumeration of the locus. Gene Mapper v3.1 software assigned the alleles 
automatically. The ‘bins’ in this served as the position where the allele of a specific 
size would be housed (Fig 8). The data was also manually checked and ambiguous 
ones were resolved by careful scrutiny or by re-runs. 

A custom made Multiplex PCR was also performed for 2 YSTR loci and 6 Y- 
Indels. The list of YSTR and indels studied are given in Appendix 9. This assay was 
called as Multiplex 2. The PCR and electrophoretic conditions applied are same as 
that of YSTR assay. The DNA concentration used for this assay was 2 ng/ul. 


3.5 Quality Control: 


Several steps of quality control measures were built-in at every step of 
genotyping. During DNA extraction, care was taken to handle the samples in sterile 
conditions, as they could be potentially pathogenic. The sterile laminar flow bench, 
the work room and benches were periodically sterilised by 70% ethanol, UV and also 
by fumigation as required. Lab coats, gloves and face mask were worn as protective 
measures while handling the samples and PCR setups. 0.1% sodium hypochlorite 
(27908, Qualigens, Mumbai, India) was used to discard the DNA extraction reagents 
and waste solutions. 

Lab technician assisted during the preparation of all the dilutions for various 
assays and dotting of 384 well plates for Taqman assay (Doer and checker). All the 
DNA samples were handled on ice to avoid any degradation. Probes and primer 
aliquots were prepared based on the requirement for each assay to avoid excess freeze 


thaw cycles. 


39 


Figure - 8: YSTR assay Data output seen as in Gene Mapper V3.1 software) 


=— =< ox om i 


® 9 {10 1a 1% yn mo mw »@ H DP HM HM HM Mw 


be0i3.40) fs 
nt HS 


Allelic ladder | 


Note:The uppers lane shows the allelic ladder .The alleles (peaks) are shown within the ‘bins’ 
(Grey). The lower lane is the sample showing its characteristic allele peaks for each coli. 


The YSNP PCR runs were validated by using positive controls, negative 
controls, female DNA and NTC. The positive controls included the samples which 
would give a positive reaction to the derived allele under investigation. The negative 
controls included the SNP belonging to the ancestral allele of the YSNP under 
investigation. The female DNA should show no fluorescence as there would be no 
amplification. The No Template Control samples (NTC) which included 1pl of TE 
buffer instead of the DNA, also showed low fluorescence. 

For YSTR assay the positive and negative controls provided in the Yfiler kit 
(“YFiler.ABI.Online.Protocol,” 2012) were included in every PCR setup. The allele 
assigned for the positive controls were verified with the Yfiler kit product insert. For 
Multiplex 2 assay, the laboratory personnel’s DNA were used as control. These 
samples were used during all the PCR set ups. Allelic ladders were run along with 
each plate for fragment analysis. The sample allele peaks were compared with that of 
allelic ladder to assign the allele call. 

3.6 Statistical Analysis: 

The samples were analysed based on various statistical tools. The multi copy 
markers DYS385a and DYS385b loci were eliminated from all the analysis due to the 
ambiguity in distinguishing these loci. As DYS389I is embedded in DYS389I], the 
STR repeat values of DYS389II were subtracted from DYS389I and the value was 
used as DYS389b. The NRY HG frequency table was calculated by Microsoft Excel 
2007. Fisher exact test was performed to access the non-random behaviour of the 
observed frequencies. It was calculated in Microsoft excel using an add-in (Obert , 
2005). Nei gene diversity (Nei, 1987) was estimated to determine the NRY HG 
diversity. This gives the probability that two randomly chosen samples have different 


YSNPs in a given population. 


40 


The maps showing the pie charts representing the NRY HG composition in the 
respective geographical area was performed in SAGA version 2.0.7 (SAGA 
Development Team, 2008). The contour maps were constructed using 3d FEILD 
v3.5.3 software using Kriging method (Vladimir, 2012). 

To analyse the exact test of population differentiation, hierarchical AMOVA 
was performed by using Arlequin v3.5.1.3 (Excoffier et al., 2005).The three 
hierarchical levels among populations within group (Fsc), within populations (Fst) 
and among group (Fct) were computed along with their p values for 1000 
permutations. Fst genetic distances based on YSNP allele frequency and Rst distances 
based on YSTR were also computed using Arlequin v3.5.1.3. 

The evolutionary history of populations were inferred using the Neighbour- 
Joining (NJ) method (Saitou and Nei, 1987). NJ trees were computed and plotted 
using the software MEGA4 (Tamura et al., 2007). Pairwise Fst and Rst distances were 
used in the computation of these trees. To compliment this, Principal Component 
Analysis (PCA) (Jollifee, 1986) was performed using HG frequencies. The eigen 
vector associated with the largest eigen value has the same direction of first principal 
component. The eigen value associated with the second largest eigen value determines 
the direction of second principal component. The significant Principal components 
were identified using skree plot (Cattell, 1966) indicating the fraction of total variance 
in the data as represented by each PC. PCA was computed using R version 2.11.0 
statistical software (R.Development.Core.Team, 2010). To access and visualise the 
similarities or dissimilarities among the study populations, Multidimensional Scaling 
(Kruskal,1964) was computed based on Rst distance in R version 2.11.0 statistical 


software. 


41 


Age of YSNP lineages of a population were calculated using Average Square 
Difference (ASD) method as mentioned by (Sengupta et al., 2006). The average 
square differences between all the current Y chromosomes and the founder haplotype 
was calculated and averaged over loci. Standard error was computed over loci. The 
ASD value was divided by w, where w is the average Y-STR mutation rate of 
0.00069 for 25 years (Zhivotovsky et al., 2004). The age was expressed as Kilo years 
(Kya). Haplotypes from populations with sample sizes of 5 and above for a given 
haplogroup was selected for ASD estimates. 

Phylogenetic networks and mismatch distributions were computed in Network 
software version 4.6.10 (Fluxus. Technology. Ltd, 2012). Reduced Median (RM) 
networks were plotted with a reduction threshold of 1 (Bandelt et al., 1999;(Forster et 
al., 2000). The weights were applied inverse to the STR variance. The weights 
assigned to variances 0-0.2 was 10, 0.2-0.4 was 8, 0.4-0.6 was 6, 0.6-0.8 was 4 and 
>0.8 was 2. The input files were prepared using the programme designed by M/s 
Chella softwares, Madurai. Mismatch distributions were also computed using the 
Network software for the STR belonging to specific haplogroup under study. It 
determines if the observed variance in the populations is an effect of any demographic 
event (Slatkin and Hudson, 1991). 

The presence of ancient haplotypes in the populations were determined as the 
Sum of Squared distance (SSD) from the median haplotype for that HG. This method 
assumes the median haplotype to be the founder haplotype (Sengupta et al., 2006). 
Smaller SSDs represent older haplotypes while larger SSDs represent recent 
haplotype. 

Coalescent methods implemented in BATWING (Wilson. et al., 2003) were 


applied to compute the split times of the populations under investigation. This 


42 


software assumes no gene flow among the populations. However, it showed 
remarkable sensitivity to gene flow between populations with paternal lineages from 
same HGs and lower sensitivity to immigrants bringing newer HGs into the parent 
population (ArunKumar et al., 2012; Haber et al., 2012) priors used for determining 
the slit times and constructing the phylogenetic tree were as follows: 

e Mig model was set to 1, assuming samples are drawn from sub 


populations and 0 when no population sub-division was considered 


e Size model was set to 2 ie., the populations remained constant and then 


expanded. 


e The mutation rates were based on Zhivatosky’s evolutionary rate i.e., 
0.00069/site/generation (Xue et al., 2006). The prior for population 
was based on the ancestral population size during the Pliestocene 1.e., 


10,000 (Harpending et al., 1998). 


e The growth rate (alpha) was set to 0.005. The generations before which 


the population growth starts (beta) was set as 2. 


e The number of Markov Chain cycles was set tol.5 million. 


The post processing of BATWING data was performed in R v 2.11.0 
statistical package. 0.5 million samples were removed as burn-ins. The split times, 
TMRCA, total effective populations sizes and population expansion times were 
determined with 95% confidence intervals. The phlogenetic tree was plotted using 
Dendroscope (Huson et al., 2007). 

Structure v2.2 software was employed to detect the underlying genetic 


structure among a set of individuals using YSTR data. This is a Bayesian model based 


43 


on K mean clustering that assigned assign individuals to hypothetical ancestral 
populations. A model in which there are K ancestral populations (where K may be 
unknown), each of which is characterized by a set of allele frequencies at each locus. 
Individuals in the sample are assigned (probabilistically) to populations, or jointly to 
two or more populations if their genotypes indicate that they are admixed. It computes 
the proportion of genome of an individual originating from each inferred population 
(quantitative clustering method). The number of MCMC cycles was set to 10,000 and 
after burn-in length of 1,00,000. Several runs with different Ks were performed. The 
run with the maximum likelihood for a given K was considered to have captured the 


best structure from the data (Pritchard et al., 2000). 


44 


RESULTS 


4. RESULTS 


4.1 The logic of studying populations from Deccan and Gujarat 

Deccan is one of the oldest geophysical regions of the world. It was made of 
multiple layers of solidified flood basalt and step like hills, forming the landscape of 
this region. As on date it is bound by Vindhyas on the North, and the three oceans: 
Arabian sea on the West, Indian ocean on the South and Bay of Bengal in the East ( 
Britannica Online Encyclopedia, 2012). Currently the Deccan region consists of the 
four Dravidian states (viz: Tamil Nadu, Kerala, Karnataka and Andhra Pradesh), 
Maharashtra, southern Madhya Pradesh and Orissa. Its topography, geology, optimal 
climate supported by two monsoons (southwest and northeast) supported the life of a 
variety of species of plants and animals. The Western Ghats, present in the west side 
of the Deccan plateau, possess a very high diversity of flora and fauna. 
Archaeological evidences from the Deccan, especially Karnataka, Jeruru Valley, 
support an early inhabitation of this region by Man during the Palaeolithic (Petraglia 
et al., 1998). 

Genetic studies based on the NRY have described the first coastal migration 
from Africa to Australia through this region (Wells et al., 2001). The laboratory at 
Madurai contributed significantly to this discovery and subsequent studies have 
shown a rich genomic diversity and cultural heritage in Tamil Nadu and Kerala, the 
two Southern most states of Deccan India. Ensuing this first coastal migration, an 
early settlement in Western Ghats has been suggested based on NRY evidences from 
this laboratory (Kavitha, 2008; Arunkumar ef al., 2012). Apart from the early 
inhabitation of the Deccan, this region had been subject to clear population 
differentiation developing a caste/tribe specific distribution of NRY, in contrast to 


mtDNA (Thangaraj et al., 1999; Bamshad, 2001;Cordaux et al., 2003). The ancient 


45 


inhabitation of the Southern Deccan has given rise to the origin of a few NRY 
haplogroups (L1-M27, H1-M52) (Sengupta et al., 2006). 

From the linguistic point of view, the people of Deccan speak languages 
belonging to the Dravidian linguistic family, though majority of the people as on date 
speak one or the other south Dravidian languages, presumably originating form a 
common root, proto-Dravidian (Renfrew, 1996). Nonetheless, the Central Dravidian 
speakers the majority of them being the tribe Gond, are distributed in larger numbers, 
~3 million, ranging from Eastern Maharashtra, Madhya Pradesh to Orissa. It has been 
previously described from our lab that these central Dravidian speakers are 
genetically closer to the geographically proximal Austro Asiatic speakers than the 
linguistically proximal south Dravidian ones. They have shown evidences of 
expansion unconnected with the populations of southern Deccan particularly Tamil 
Nadu (ArunKumar, 2012). 

The NRY genetic structure of Tamil Nadu has been laid during Palaeolithic 
period, well before the Sangam Epoch of Tamil classics dating ~200 BC and the 
introduction of the Varna system (ArunKumar et al., 2012). The Dravidian kinship, as 
well as the languages are thus unique to Deccan and highly evolved (Trautmann, 
1981). Studies on Kerala populations, except select tribes of Western Ghats, have 
shown a levelled presence of different NRY lineages which were attributed to various 
factors such as Roman connections, sea farers and social movements (Kavitha, 2008). 
In the light of this scenario, it was of interest to study the NRY profile of various 
other linguistic states of Deccan: a language co-evolves with culture and gene pool. 
Thus I studied 2,522 samples from 33 castes and 18 tribes, from four States viz. 
Karnataka, Andhra Pradesh, Maharashtra and also Gujarat (Table 2) that lies on the 


coastal route of early population movements into India. All these states have many 


46 


exotic tribes and well differentiated castes. I thus tested if the caste tribe divide, 
language, geography or other social characteristics explained the genetic diversity 
observed among the populations of Deccan and Gujarat. The fidelity of NRY markers 
was appreciable and conclusive, and this was possible with sampling strategy and the 
genomic techniques employed in The Genographic project. 

In the following chapters of results, the data and analyses are presented state 
wise, so that it is easier to define various issues in each state. The results are then 
interpreted in the light of other data available from our studies on other states and 
others studies, thus drawing a holistic picture of peopling of Deccan India, through 
NRY HG and STR markers. The fidelity of these markers to populations, language 
and vocation seems to have thus been determined quite early in the evolution or 
settling of these populations in India. 

4.1.1 Gujarat — The coastal gateway to India 

A total of 413 samples belonging to 7 castes and 6 tribes residing in various 
regions of the state, Kutch, Sourashtra and the regions adjoining Narmada were 
collected and studied (Fig 5). The Y chromosomal data were analysed to appreciate 
the genetic composition and their relationship to geography, subsistence language and 
social ranking. The ethnographic details of the studied populations are presented in 
Appendix la. 


4.1.1.1 NRY Haplogroup frequency distribution in Gujarat study populations: 


Table 3 presents the list of populations studied and their NRY HG percentage 
frequencies. When all the study populations of Gujarat were considered together, 
NRY HGs Rlala-M17 (21.7%), HG H2-Apt (19.8%) and HG Hla*-M82 (17.6%) 
were present in higher frequencies, accounting to 59% of the total NRY diversity. 


When individual populations were considered, the HG Rlala-M17 was the most 


47 


901| Itz | 80 | €1T | 70) 00) 10) 00) 00) Z0 | SE) 00) 68 | £0) Ze) ET | OO | 25 | FO) Fz | FO | siz) €0 | 27 | 20 Z0 | 00) 00 |00\T0}| £z8 Aouanba.y [04 vyeeUAE 
-/+ $89°0 LT 
ra i ee 
-/+ LS88°0 PARTI 
“/+ $SBL'0 a +--+ gg | eyedaeH urmyerg 
“/+ SEb8°0 = Hf} 18 epmon| Fe 
-/+ p9L8°0 88 BIDOARS ON 
-/+ S1L8'0 Lz | Lz | 
aa - EEE 
98S0°0 -/+ €€€S°0 a Vo" ba fd | emserespnoy unmyerg 
96€0°0 -/+ €6LL'0 56 
Ps ree] 07 | | ro | z1t|erlzolzzil rol ooloo looloo Aduonba.y [B}0} B.QYSVALYLA, 
18S1°0 -/+ p6Er'0 €7 ve 
SEO ~/+ 686L'0| $71 —|—_|_ ao — &C 
T8ST'0 -/+ PoEr'0 (4 
OPLO'O -/+ 60680 16 | 16 3 | 61 
S€>0'0 -/+ 686L'0 = Yr [| | ff 4 LI 
L970'0 -/+ L9E8'0| SOI aa = Ly sT 
SEP0'0 -/+ 686L'°0 HE = eyyseysoq UlMIyerg vl 
861 | 00)92T| ZT) 00) 20) 70 | 97 | FT | OS | ZT | S0lo0 Aouanba.y [8,01 ye1elN 
0000'0 -/+ 0000°0 pee ee Gara €1 
8LrT'0 -/+ L99F'0| L'9 rep cl 
8LPT'0 -/+ L99¥'0| $6 —+-+--++|-} Hndurog ururyerg, W 
ObPT'O -/+ LIbh'0] S'ZT Yr | | | ff 4 Tony urayesg ol 
9LIV0 -/+ 71790 =] L 
9LIT0 -/+ T1790 | 9 
0L90°0 -/+ 81L8°0 SOE s 
9LIT0 -/+ 71790 Hf} v 
OLIT'0 -/+ T1790 78 £ 
€L10°0 -/+ 2098'0 = = 4 
9€L0°0 -/+ LSL9°0 | [rs] [ves fer LE IpPIS I 
ANISIOATP OU05) wt | so [49 | x0 | Z are 
Qa 
a4 <a S s/s JP SES ASSIS S/S S/S S/ISISSIN SINE aslo S/n S(G SS ALE cee Joylew INS AUN 
suoryetndod Apnys JusJaJJIP UI SOH AYN SHoweA Jo uoTNqrysip (%) Aouanbory :¢ aqe 


IST NOL 
28 | rtr | €0| 81 | r0| €0 £0 | 60 | 89 | 90 FT| 60 | rr | FO os | 10 | FT £97| 90) FE | OF | £8 | OF | FO pz [Honba.y [e90} ysapeadg EAypUy 
0810°0 -/+ psrs'o| OTT Tl Vl EG fat 16 BUTE 
ZOv0'0 -/+ 07Z8'0 19 o€ €€ vAepe A 
Z810°0 -/+ ¥S98°0| TIT il a 1 wi |e rr ee ee ee |i 06 ndey 
1L80°0 -/+ 1Z¥9'0 os 0z ANY uruyeig, 
0010'0 -/+ 8rT6'0| 66. Le ee Sz Sz 6 18 virteames| > 
16Z0°0 -/+ 81€8'0 EG S01 LE eprlariq uUYyerg & 
S970 ~/+ 1618'0] 69 | ay ve gs nfey| 5 
ZEEO'O -/+ ILZ8'0] Er et é| GC or ereuumey epuoy| 
SOEO'O ~/+ P1S8'0] +S OL ig v's Le App epuoy] 
$9Z0'0 -/+ O€€8'0| 8°9 Lel iG It €L weer 
SESO'O -/+ 8EZ'O] ER 87 87 oS |sia 87 87 87 9€ Sipe 
psro'0 -/+ 6z0L'0| 19 0°01 Pl i TT [ae 06 ient 
6rr0'0 -/+ S190] EL | 19 vt | TT vt vt vt vt ran zB BRA 
&ysvoapouen} 7a | 2 |awt | O} i ]«d | a | OS] fezcofact{«z1]it}ax jar] S| Ss] 8 | «| ca] E | E lath} «Hh | O | wd | so] so |uofualalv| 2 wsito 
- = a uonejndog yo aig 
OH-AUN 5 
“s/ & (s/s /Sl"s SSS] S/S] S188) 8 PES 8 | ESS) eS) SS) 8] 8 8S Psi sl Sr S83] 8 <<< JoqIeUI INS AUN 
N x So & iS BL a a a X S = So i) QQ = So] 4 wo] N is) © o}| © an iv) an wl ole 


SO'0< sanjea d yuRdIJTUSIS-UOU aJROIpUT S][20 YULT_ :o}0N 


cO-dT b0-d9 TO-a'T 16 eUIWe yy 

ce BAeype A 

cO-d'T bO-dT 06 ndey 

90-A'b 0z ANV Uluryerg, 
0-9 T ZO-a I SO-a'€ 18 elipeqmas| & 
70-17 90-a'F +0-4'T 60-a'T 70-47 LE epraeiq urmmyerg a 
wo-a€ CO-HT €0-d'T e0-d'€ LO-a'T 8S ney il 
co-a'b 60-H'T | 80-d'S cO-HL cO-aT or BIBUIUIE YY BPUOY So 
S0-a'8 €0-3'6 LE Appar epuom] B 

a a a a Ta a eC a De NC TPE 

cO-dE s0-a°9 w8 Rew 

Pp 

60-1 SL eqniny, 
7O-HT |80-FE L9-TT e0-a's €L BseIOy 
co-a 8S BARE] B 
1-36 |€0-1'8 90-17 €0-F'€ co-ab 88 eyeAaey Unuyerg| S 
0-78 SO-ab p0-d'8 p0-H'T |SO-'9 $9 BABIO X 5 
cO-dT 9% eqninynuss| = 
SO-ae 19 ByeeUIeYIpy | 
za [erera] «a [erro] 10 d_feveo[eeeeo| ezo | «e1 | «zt | 17 ace Lsmaitamer ar oe bar tiempaa ba a | 99 sha ua | a N uopemndog 
Pot ore TB =pAOD 
SO-LL |ZO-T 1 wO-ae tL syung, 3 
w-a9 os eaepoyl| & 
0-H? | IZ-AP to-day cO-ab 60-21 10-d'S | pO“ L b6 peMserespnor UTUTYe.1g | ps 
90-a'1 €0-a'7 s0-a9 [aa resus] 

3 
cO-aT ce-ae 10-4T cO-ae cO-dE or nMyoy 
cO-dT eO-H'€ |ZO-A'S CO-HE ¢0-a'8 CO-HT Lt weloy 

woah b0-78 cO-aT br THEM 
cO-d'T 7O-a'8 e0-a's Ol-we 9S Leqyey 
SO-aT ral puoy fey 
woah eat Le puopn| = 
0-09 70-a'b wz suey & 
1o-a'z [zo-a's IT eyxoy B 
rate ome ee ed Tamu 
ars wae a 
90-4 T 0-09 pO-HT | €0-A'T 90-1 J eT-aT c0-a'T LO-T cO-a'T 0-1 98 90SIed 
TO-HE cO-aT cL eyyseysoq ulUTyeIg 

q 
90-49 91 Ty Ny UrUAYyerg 
Hal Po 91 yndlew 
LO-H'6 IZ unduwiog urmyerg 
SO-AT SI TAyey 

so-r9 PT feof corre zl EUPIA] o 
TO-A'T 90-1 90-F'T Iv Jed e 
cO-AT CO-AT 61 yoy] = 
b0-TT el BAILY 
e0-a's so-a9 cO-d'T cO-aT €9 BARSEA 
€0-H'8 |80-F'T c0-0T 89 BIEMIOY 
cO-aT €L BIpoyyey 
€C-H'9 |80-F'€ Le TPPIS 
818 ~|B/ Bs a . 
za fPrra} wk fed] 1 | «O | d | & | & [eo | xe | et | it | ox | ocr a | 2 S | « | am |] & & [eH | |] 9 | af | so | so | wo | ua | a us uonemndog B 
o % * 


suonendod aanoadsal ul SOH AYN snow jo sonyea d Lay 


*suonejndod Apnys jua.zajjip ul satduonba.y OH JO AIquipas oy} UrE,1I9SE 0} (L.A) 1821 OVX JOYS JO S}NSOY =p [Quy 


frequent HG present in many caste populations studied, with an absolute frequency in 
Rajput (100%). Brahmin Kutchi (75%, Fisher Exact Test (FET) 6.E-06) and Brahmin 
Sompuri (71.4%, FET= 9.E-07) showed significantly high frequencies of this HG. HG 
Hla*-M82 was found in highest frequencies among Koli (31.6%, FET 1.E-01), a 
caste population subsisting on fishing. It was interesting to note that Siddis, a 
population of recent African descent, showed the highest and most significant 
proportion of HG CR-M168 (54.1%, FET 6.E-23) in consensus with previous reports 
(Ramana et al.,2001). The HG H-M69 clades were predominant in tribal populations 
of Narmada valley (Fig 9). Thus the populations of Gujarat showed a wide spectrum 
of various paternal lineages, indicating their complex histories. 

On further analysis, the NRY HG composition showed an interesting pattern 
of distribution when the populations were divided into caste and tribes. The Nei Gene 
diversity of the HG distribution showed that the tribal populations of Gujarat were 
more diverse (0.8502 +/- 0.0136) than the caste populations (0.7197 +/- 0.0353) (p 
value <0.0001). The pie diagram depicting the NRY HG composition among the 
castes and tribes in various geographical regions of Gujarat further delineated the 
geographical barriers in the distribution of these castes and tribes and this reflected in 
the NRY. While Narmada valley tribes (Kathodia, Kotwalia, Vasava and Ratwa) 
showed higher proportions of Hla*-M82 (24.4% FET 2.E-04) and H2-Apt (36.4% 
FET 4.E-21), the Koli (Fisherman) from Sourashtran region showed higher 
proportions of Hla*-M82s, while Maldhari tribe showed J2a*-M410 (58.3%, FET 
4.21E-07) and L1-M27/76 (25%, FET 6.E-03) the most frequent in Southern India, 
particularly Tamil Nadu. In contrast to HG H-M69 clades, HG Rlala-M17 was 
present in almost all the populations with the highest proportions (~70%) in Brahmin 


populations and Gatvi an agricultural population followed by Jains (57.9%). 


48 


(xoidde) JOAL BPRULIBN JO AP [EUONDIIIP SY} SMOYS dy} S]JUdSoIdaI y, 

“sopeyd H 

Ul YOL DJOM BPRLUIBN Wo suoneyndod Auew pue / [JA\-B[2,Y OH Jo wow sey (epewweN Jioddn pue 
youn y) Jesefny Jo Jyey Joddn wo parpms suonejndod oy} yey} BON “7 91qGe} Ul payussoid ose poipnys 
sojdues Jo Joquinu oy[ ‘Ady OY} Ul payeoIpul sunojOo JUdIayIp Aq payuasoidas due SOH AYN SNOLIA 
OY L Poyda][00 DI9M Sojduues a19YyM WO (S2}VUIPJOOD) SUOISOI 94} UI pooeyd sue syeyo aid oY] :10N 


Bre pre OFZ GEC TEZ BZ He OV FI Tbe BOOZ FOZ 002 


yerufny Jo suoneindod Apnjs ut sa1uanbaly OH AYN Suysidap sj1vyd dig 26 IANnsly 


4.1.1.2 Neighbour Joining tree: 


The evolutionary history of these study populations were inferred using the 
Neighbour-Joining method. The optimal tree was computed based on Fst distances 
from HG data (Fig 10a) and Rst distances from STR data (Fig 10b). The caste and 
tribal populations formed two different clusters in both the trees: however the caste 
populations Patel and the fisher men population Koli clustered with other tribal 
populations of Gujarat. Further, all the caste populations except Patel, having higher 
frequencies of HG Hla*-M82 and HG J2a*-M410, clustered together in the second 
arm of the NJ tree. The Siddis stood apart in the tree but clustered distantly with the 
tribal arm. 


4.1.1.3 Analysis of Molecular Variance (AMOVA): 


To test whether the populations show any genetic differentiation when 
grouped based on language, geography or other social parameter, AMOVA_ was 
performed. The Fsc value indicates “Among population within group variance”. Fct 
describes “Variation among groups” and Fst indicates “Among population variance”. 
The Fsc values have to be lower as compared to Fct values, if there is any genetic 
differentiation among the population groups. So, when the study populations were 
clustered as castes and tribes, Fsc was higher as compared to Fct for both YSNP and 
YSTR. However, when the Patel and Koli were removed and AMOVA was 
recomputed, the Fct value (0.248) was 1.7 times more than the Fsc value (0.140) 
(Table 5). 

Table 6 shows the Fst distance matrices heat map. It can be appreciated that 
the Fst and Rst distances between caste populations and tribal populations were high. 


However, Rajputs were very distinct from the other populations of Gujarat. 


49 


Figure 10a: NJ tree based on NRY HG-Fst distances for Gujarat study 
populations 

0.17442 Siddi(37) 
Kathodia(73) 
a __ Kotwalia(68) 


0.07514 


0.09775 


**Koli(19) 
Patel(41) 


0.03987 


Maldhari(12) 


0.05304 
BrahminSompuri(21) 


0.14588 


0.06403 
Rajput(16) 
©6541 Brahminkutchi(16) 
0.05 


Figure 10b: NJ tree based on NRY STR -Rst distances for Gujarat study 
populations 


— Siddi(37) 

Vasava(63) 

y— Kathodia(73) 

<= Kotwalia(68) 


0.04970 


0.07381 


0.11145 


Maldhari(12) 
00d "Patel(41) 
Jain(18) 
0.06294 Brahminkutchi(16) 
0.03836 BrahminSompuri(21) 
—— Gatvi(15) 
0.17662 Rajput(16) 


0. 
Note: The numbers within the brackets indicate the sample size (N) studied for each 
population. The branch lengths indicate the genetic distances between the internal 
nodes. The pattern of clustering were similar in both the trees showing distinvtino 
between caste and tribe. Koli and Patel cluster closer to tribal populations. 


SOSPLUILUIOI MOPIM JO SdURMOTL? “OI SUIO}ed OSeLUeU UO poseg */ 

Ayjeoyselp sonjea VWAOWY ou} 09JJe JOU pIp sousaId aio} Ing “eyep 9Y} MOYS p[NOM 

sIoyeW JUoTOUR SUIALIVS “eOLIpYW Woy ATO\STY ATOJCISTU MOUY YIM UOHeTndod sty} Jo sdUdseId 94} JI 1S9} 0} POJBUIUITS 919M SIPPTS “9 
epeuLIeN YINOS pue YON “eyYseinog “YoyNy :apnjour suoIsay *¢ 

SIOJOWIO[FY UL suonerndod oy} usaMjoq 9OURISIP 9Y} UO poseq podnols oJoM suoleindog ‘py 

stsAyeue VAOIV WO poAaowial o19M [OY pue [ajeg # 

sdnois Suowe uonee,d 104 °¢ 

suoyrejndod Suowe uonere, 387 “7 

sdnois urg}iM suoyeindod suowe uonere A :0Sy “| 

:0]0N 


riz | zieo | pero | sero 


IS 
IppIS MOYHIM UL gPPISMOUNM ANS | ows 


suonendod Apnjs je.ref{ny oy) JO SuIdnorsqns snoLsea 3y} JO VWAOINV Poeseq AYN °S BL 


oy 6c0 = «ohEOSsCLTM~—CIZO~C«CETN'SC*«é«éTO IL€0 -99€°0 | P| 
PIE‘ | plLT0 ~—_L9E'0 Rea) co 0 tro  er0 ZL" | euprery| 
ico | Pa Fi om oo 
L8€°0 8r7'0 9c'0 ala | ee _— ee an i 
Weal st€0 yeeo | 1e0 seo 0 | toro | | 0 | | Lito | ndwos ueayerg] 
8e9'0 0 isso [@eeOm oso | weco [a aa b0E'0 a: 


yyny | Lindwios 
eyprey | erpoyjyey | BlypeMjON | BMjEY | BALSA | TOS Ped ure TAyey unayerg | uyerg indfey uonendog 


SaIUeJSIP ISH PASeG WLS AUN 


PSI OH AYN 


= 
g 
Q 
a 
n 


suonendod Apnjs yervln+s 10} S9dULISIP ISYWLSA PUL ISA OH ANN UO poseq XLYVUl 9dULISIP ISIMAIVY 39 IQUE. 


4.1.1.4 Principal Component Analysis (PCA): 


PCA based on the NRY HG frequencies was computed to determine the 
genetic relationships among the populations (Fig lla, 11b). The Component 1 and 
Component 2 contributed 57.6% and 17.3% variance respectively and these were 
influenced by HG Rlala-M17 & HG J2a*-M410 respectively. The tribal populations 
that distinguished from the rest was characterized by the NRY HG Hla*-M82 vector. 
Patel and Koli populations clustered along with these tribal populations irrespective of 
their caste hierarchy. 
4.1.1.5 Multidimensional Scaling (MDS): 

MDS was computed from YSTRs based Rst distances to evaluate the genetic 
differences among the populations (Fig 12). The analysis gave a stress value of 6.156. 
The MDS plot reflected the clustering of PCA, though in MDS the populations were 
widely distributed which can be attributed to isolated YSTR evolution within each 
population. Nonetheless, the plot showed a good distinction between caste and tribal 
populations studied. 
4.1.1.6: Phylogenetic Network Analysis: 

The reduced median phylogenetic networks were computed from YSTRs at 
the background of each haplogroup to infer evolutionary relationships among various 
populations within that HG. Only the Figs of networks that were highly informative 
are presented. The interpretations drawn from each of the HG are presented below: 
NRY HG CR-M168: The network was highly reticulated and present only Siddi 
populations. 

NRY HG C5-M356: This HG was represented only in Patel, Kotwalia and Vasava 


populations. The network showed reticulations with no central node, long branches 


50 


Figuer 11a: Principal Component Analysis of NRY HG frequencies of study popula- 
tions from Gujarat 


ee 
“ 
i) ie) 

& 


: | I I 
a oe mm 
" oS ss mM 
0 0 JU 
a o a & 


Principal components 


pcs | 


8 
Q 
& 


Note: The tribal populations are indicated as red squares whereas the caste popula- 
tions are indicated by yellow circles. The biplot, shows the contribution of each hap- 
logroup represented by lines as component loading vectors. The percentage variance 
contributed by the PC1 and PC2 is shown in the Scree plot. The PCA plot showed 
distinction between caste and tribal populations 


Figure 12: Multi Dimensional Scaling of NRY-STR —Rst distances for populations of 
Gujarat 


Bkut Rat 
(e) Vas o 
o 
0.20; an 
Jai Kot 
Bsom ° Kol o 
Kath ° 
0.00} 
= Mald 
: ao 
= 
5 -0.20} Gat 
£ (e) 
5 
Sidd 
Oo 
-0.40} 
-0.60) 
Rajp 
e) 
-0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 


Component 1 


Note: The tribal populations are indicated as red squares whereas the caste populations are 


indicated by yellow circles. The MDS plot showed distinction between caste and tribal popu- 
lations. 


with several unoccupied steps, indicating diverse sources/long term evolution of these 
haplotypes. 

NRY HG H1la*-M§82: This HG was identified mainly in the tribal populations with 
long branches and multiple un-occupied steps, indicating the possibility of drift or 
multiple distant sources of this HG among these study populations (Fig 13a). YSTR 
evolution was detected in Kathodia as evidenced by single step mutations. Minimal 
haplotype sharing was observed suggesting no recent gene flow among the 
populations. 

NRY HG H2-Apt: This HG was also mainly identified in tribal populations. 
Haplotype sharing among Kathodia and Kotwalia were observed along with stepwise 
STR evolution. Vasava showed evolution from this Kathodia — Kotwalia cluster 
suggesting a common founder group. Many terminal branches were observed in the 
network suggesting a diverse sources and/or recent in-migration of this HG in various 
populations (Fig 13b). 

NRY HG J2a-M410: This HG is seen mainly in Patel and Maldhari populations. The 
Maldharis (N=7) showed unique signature with same haplotype for all 17 loci, in all 
the samples. The YSTRs of Patel were more diverse indicated by long branches, 
indicating diverse source for these haplotypes. 

HG Q1a3-M346: Kotwalia populations show step wise mutations at the periphery of 
the branches indicating recent evolution of this HG among Kotwalia. 

HG Rlala-M17: Fig 13c showed two distinct clusters among the Gujarat populations 
without a central median haplotype. In cluster 1, population specific clusters were 
identified among the Brahmins and Gatvi. Population specific YSTR evolution was 
observed among Gatvi. The cluster 2 comprised of mainly Rajput, Patels and Jains. 


These clusters indicate unique YSTR differentiation among these populations within 


51 


Figure 13a-13c: Reduced median phylogenetic network analysis of Gujarat study populations 
Figure 13a: NRY HG Hla*-M82 
1 Brahmin Sompuri 
BB Jain 
BB Kathodia 
BB koi 


BB Kotwatia 


BB Patel 
BB Ratwa 
BB Vasava 


Figure 13b: NRY HG H2-Apt 
Brahmin Sompuri 

BB Gatvi 

BB Kathodia 

oO Koli 

Oo Kotwalia 

BB Ratwa 

Oo Vasava 


suoNeINW JO JaquINU 94} s}UaS 
-didai sayourig dsdy} UO SJOqUINU dL] “S9pOU SUIDIUUIOD OM) UIdMIOQ OURISIP 94} MOYS SoYyouBIg JO YISUI] OY] “S[ENPIAIpUI 
JO JaquNU 94) 0} eUONJOdoAd si pou siy} JO 9ZIS OY] ‘[eNPIAIPUI ue s]Udsoidal 9[dI10 YOR ‘sule¢ pue sjayeg “ndfey Ajurew 

JO postidwios Z Jaysnjo oy] “IAD pur sulrumyeig oy} Suowe paljuap! 1am siojsnjo o1y1oads uoejndod ‘] JaysnyD Uy :oI0N 


BABSEA, B 

ippis [] 

ened fj 

indey [ij 

ed fj 

1oy 

epoujey fj 

wer [| 

wep fj 

undwos uwyesg [5] 
yeseing uwyeg 


LIA 18TY OH AUN *€T FNsTy 


HG Rlala-M17. The tribal populations showed long branches indicating distant 
sources for these haplotypes. The caste populations, Koli and Patel were present 
sporadically within the phylogenetic network. This network clearly differentiates two 
sources of Rlala-M17 among the Gujarati populations. 

HG R2-M124: This HG was sporadically represented in study populations with no 
specific populations cluster. It was characterised by long branches with multiple 
unoccupied steps, again indicating genetic drift or distant sources for these 
haplotypes. 

4.1.1.7 Mismatch distributions: 

Mismatch distribution analysis of YSTR data was performed to obtain the 
molecular distances within a haplogroup for all the populations (Fig 14a-14h). It 
determines the molecular proximity of a haplotype in a given population. The HGs 
CR-M168 and C5-M356 mismatch distribution plot was characterised by multimodal 
peaks with high MPD. The networks of these HGs showed heavy reticulations 
suggesting the possibility of long term drift. On the other hand though the mismatch 
distribution plots of J2a*-M410 and Qla3-M357 showed multimodal peaks, the 
YSTR networks did not show reticulations, indicating multiple sources of YSTRs for 
these haplogroups or in-migrations. HG H2-Apt showed a distinct bimodal peak 
indicating at least two different sources for these HGs. HGs Hla*-M82 and Rlala- 
M17 showed a single modal peak indicating a single source for these haplogroups but 
lower Mean Pairwise Difference (MPD) values indicating recent evolution for these 
haplogroups in study populations. The network of Rlala-M17 showed two distinct 
clusters while only a single unimodal peak was observed in the Mismatch distribution. 
This could be explained by the fact that the two clusters in the network were only 


single step away. 


52 


Figure 14a - 14h: Mismatch distribution based on YSTRs within each 
haplogroup for study populations from Gujarat 


teuamve trequency 


annsr HG CR-M168 fia HG J2a*-M410 
“ MPD: 6.21 Geach? MPD: 9.20 
nee HG C5-M356 HG Q1a3-M357 

- MPD: 11.60 0 =~ MED S05 

: ae HG Rlata-M17 
om HG H1a*-M82 18 es MPD: 6.28 
ses MPD: 6.04 a 
“Td . HG R2-M124 
I - HG H2- 
G H2-Apt MPD: 9.05 


MPD: 9.74 


4.1.1.8 BATWING Analysis: 


A set of three BATWING runs were performed to estimate the divergence 
time of the populations studied. The first BATWING run consisted of all the Gujarati 
populations and a coalescent tree was obtained (Fig 15). The tree showed distinct 
clustering of castes and tribes with a coalescent time of ~6kya. Maldhari and Ratwa 
showed a recent split of 977 Ybp. Similarly Kathodia and Kotwalia also showed a 
recent split of 1.1Kya. These four populations shared a common ancestor around 
2Kya. Vasava had a common ancestor with these ~3Kya. Brahmin Kutchi and Rajputs 
showed a recent split of 1.5Kya. These populations coalesce with Brahmin Sompuri 
around 3kya. Jains and Patel showed a split time of 5.7Kya. TMRCA of all for all 
Gujarat populations studied was 52,252 Ybp (95% CI: 47,875-71,675) which was 
overlapping with the expansion times for all the lineages (47,195Ybp (95% CTI: 
45,037-82,134)) indicating a long term expansion of these populations in this region 
(Table 7, 8). 

The ancestral effective population size (Na) was estimated to be 1,977 (95% 
CI: 1,796-2,036) for all the Gujarat populations. Second set of BATWING 
simulations included two independent runs consisting of (i) Caste populations and (ii) 
tribal populations of Gujarat in each to estimate TMRCA and Na. The total Na of the 
caste and tribal populations was found to be 2,273 and 998 respectively. The TMRCA 
of the caste and tribal populations for all lineages was found to be 53,173 and 60,417 
years, suggesting that the tribal populations were ancient than the caste populations. 

To determine if the age of HGs present in each population reflected similar 
time frames, a third set of BATWING simulations, one for each population, was setup 
and the TMRCA of each HG in every population was computed (Table 9). Though 


Koli and Patels clustered with tribes in previous analyses the HG age estimates were 


23 


ehy ul paejussaidas cue saby , 


o00et+- 


ZOLS 
P7LT 
ZOLS 


qos 
Pr69 


RD 
87S 


unduiog uunyeig 
6F0E Scr 18 


yndfey TST $107 
Zest 
LIST 68E 


Rwy UNE 


BABSBA 
90TE 


$a 


osrs 
— aoe ca 


ener $171 


LL6 


eupril 76 ame 


yeavfn-y JO SuonElndod Apnjs «A0J 99.9 IYIUaSO]AYd poseq ONIALLVA 2S Ansty 


Table 7: BATWING estimates of Ancestral Effective populations size of various study states 


Ancestral effective 
population size (Na) 


128-1649 
262 111-1463 


686-4353 
596-4353 


288-4324 
270-4330 


ti 
State Population mnesstralEatre me 95% CI 
population size 


Brahmin ANV (85-3262) 
Brahmin Dravida 1905 (131-3085) 
1546 (82-3329) 
869 
489 
Andhra 
Pradesh 
ig 356 
j 361 
i 326 
ttibalija 


Kapp 69-1555) | 
KondaKammara | 89 | 84-2791) 
Madiga | 856 80-1985) 
Raju ot as) | 
Rei 86 at 


Note: 


The Ancestral effective population size has to be looked with caution as populations belonging to 
different study states have been used in different BATWING simulations 


(LOTSE-B8E9T) TL9°TE 
(ILL‘vS-@PE‘TS) OLZ TS 
(969°6b-668°br) IZ6‘LY 


(pEI‘Z8-LE0'Sb) S6I‘LP 


(ID %S6) eH 
uolsuvdxs uonendog 


(SLL‘V9-LE1‘6S) ZTSETO 
(608°06-984'6L) OFT'08 
(8L9‘°79-61S°SS) 8OQ°LS 


(SLO‘IL-SL8‘LY) TSTTS 


(ID %S6) VOUWNL 


(LSIPI-9S6Z1) TOL‘ET 


ysopelg elypuy 


(p76 1Z-8S1'1Z) LZL‘1Z 


(0r9°9-788'S) OTE'9 | scr | emysereqeyy 


(ID %S6) 
(8) ozIs uonendod | parpnjsN | ayeyg Apnyg 
dATQIIJFA [V.1SIUV 


sa}¥ys Apnys jo 


soul) UoIsuvdxs uoneindod pue VWAIYWL ‘3ZIs uonepndod saysajjo [e.ysaduy poseq ONIM.LVA °8 Ade 


(SLTSL-OSP'9T) OS9EE 
(OSL°ZL-OST‘9T) SLL°TE 


(SLI'L6-SLS‘b7Z) SL8‘L 
(06'°6L-006°LI) SZ6°9E 


(000°89-Szr‘91) STPEE 
(SLETL-OSP'8I) SLE9E 
(OS7'Z9-SZ0'F1) 006°6Z 


(SLS‘6S-SZ9'ET) OS'8Z 


TOM 


See 
a a 2000500 SE 


(OSO‘Lr-0SP°7Z1) STHET 


(SZI‘TET-006'T€) 006F9 


(SZ0'r9-S77'ST) OSETE 
Oso'er-0S9'6) OSTIZ | BPE 


SLC6b-059'6) OOS TZ [PTE 
OOT09-S7S*8T) S7S*CE 


(OSO'%h-OS TET) StHEZ (OST‘E9-SZL'8T) OO6'EE | (SLE'ES-SLO'ST) OSS*LZ | (OOT‘T9-O0E'8T) OSO'EE 


PC Csiss-oorst) ooce | 
Po Ks 9X69 0SL'91) Sus" pf 


_——— ee 
PC TTS 


he ee (€ETED-LS6'ET) GET OE (ae (St779-StPpl) SLEGT | (SLO'I9-OSS‘ET) 0S0°8Z 


(OSO‘97Z-SLT'S) OSOTI | (96L‘P6-O€0'SZ) TI8'Lr | (OOS*ZL-STH'BI) STS*SE | (OOTLP-OS9'TT) S787 | (SLP‘SS-OST'ET) OL‘9T 


Po (OOL*EL-Szr'sT) Oos’se | (SzE‘Sp-SzPI‘T) 0077 


(00S°9S-OST‘9T) SZO'OE 
(SLESS-0S8°ST) 00S‘6Z 


(€Z6°€-PhT 1) Lel‘Z 


(SL0°86-00I1 TE) SL9°ES 


(000°79-009°LT) SLY°ZE 


Po (00r°S9-0S ETT) 000°9Z 


Po (6r6°9L-O17' PT) O8E'EE 


(00Z'€9-O0Sr9T) OOL'TE 


yervlnyy Jo suonejndod Apnjys 


yeyre{n-y Jo uonL[ndod Apnjs Ydvo UT 9}vUITJSE 9S" NH AYN PISeq ONIM.LVA 26 WA42L 


appreciably different among them. The HGs C5-M356, Hla*-M82, J2a*-M410, 
Rlala-M17 and R2-M124 showed similar ages among Patels and tribals. Similarly 
the HGs Hla*-M82, H2-Apt, J2a*-M410, L1-M27/76, Rlala-M17 and R2-M124 
showed similar ages between Koli and the tribes. This further supports the view that 
Koli and Patel are closer to tribal populations, although they are not similar among 
themselves. 

Another interesting observation was that the ages of various HGs are markedly 
different among caste populations than tribes. For example in Brahmin Sompuri the 
HG ages range from 17.4Kya to 26.9Kya and in Gatvi it ranges from 4.9Kya to 
47.8Kya; while in tribes the range of the age estimates were much smaller, example: 
Ratwa (21.1Kya to 35.5Kya), Maldhari (33.5Kya to 37.3Kya). This suggests that 
although caste populations have a lower Nei gene diversity, a result of lower number 
of HGs present, the histories of each HG in the caste populations is markedly different 
from each other which may be due to multiple event of gene assimilation. On the 
other hand in the tribal populations although the Nei gene diversity (Table 3) is more, 
the ages of the HGs are more uniform: this may be due to the evolution of each 
Gujarat tribes from a diverse ancestral gene pool and have not received much gene 
flow in the recent past. 

Overall the populations of Gujarat showed clear genetic variation in relation to 
caste and tribe divide and the caste populations were found to have more complex 


histories than tribes. 


54 


4.1.2 Maharashtra - Interface of North & South 

A total of 458 individuals from the seven tribal and six caste populations of 
Maharashtra were genotyped for Y HG and STRs (Table 2). The sampling locations 
are shown in Fig 5. Populations from Western Ghats and adjoining regions, 
Maratwada and interior Gondwana land were sampled. The ethnographic details of 
studied populations are presented in Appendix 1b. 


4.1.2.1 NRY- Haplogroup frequency distribution in Maharashtra: 


All the study samples from Maharashtra, totalling 458 showed appreciable 
frequencies of NRY-HG Rlala-M17 (20.3%), HG Hla*-M82 (17.9%), HG R2-M124 
(11.1%) followed by HG J2a-M410 (9.4%), HG O2a-M95 (7.9%) and L1 (6.8%) 
totalling 73.% of the total gene pool (Table 3, 4b and Fig 16) Nonetheless, at the 
populations level, Dhangars, a pastoral population had highest proportion of Rlala- 
M17 (45%, Fishers exact test 3.E-04), followed by Maratha (44.2%, FET 2.E-04), 
Deshastha Brahmin (41%, FET 7.E-02) and Chitpavan Brahimin (35.7%, FET 5.E- 
02). Highest frequencies of HG Hla*-M82 were seen in Raj Gonds and Gonds (75% 
and 70.3%, FET 2.E-13 and 2.E-05 respectively). Mang, an artisan population, 
showed high HG R2-M124 (31.8%, FET 6.E-03). Parsee, a migrant population from 
Zoroastrian region, Iran and a highly endogamous population residing in and around 
Mumbai, showed significant proportion of HG J2a*-M410 (33.7%, FET 1.E-13). 
Korku, an Austro Asiatic (AA) language speaking population had very high 
proportion of O2a-M95 (75%, FET 2.E-32), the commonest allele of AA speakers in 
Orissa and northeast India. Kolam, a Dravidian speaking tribe in west India showed 
equal proportion of HG O2a-M95, HG L1-M27/M76and R2-M124 (18.5%). The 
Nei’s gene diversity of caste populations was 0.8538+40.0140 and that comparable to 


the tribes exhibiting 0.856840.0122. 


55 


sopyp 
H pue 7 ‘x ¥Ze ‘VTUTe Gs1y pamoys B.QYsEABYLYA] JO YINOS Pue JSAM Sp1UMO}Z suoynyndod oy, ‘*&®Z7O DH UI 

YL SEM NYAOY “SapepD {JO uos0doad Ysry MoYs spuoy fey puL spuds 24} FEY) BION “T 91") UT pazuas 
-a1d 918 paipnys sadures jo s9quinu ay, f, “Ady 94} Ul PI}JVIIPU! SANO[OD JUIAIJJIP Aq pojussoida. 318 SOH AUN 
SNOLIVA JY “P2}II][09 319M Sopdurus I19YM UWA (S9}vUIP1009) suo1ga.1 ay) UI pase] aa¥ sjavyd s1d 9YyT :9I0N 


z6 88 68 08 92 cl 89 9 O09 GS 2S BP HR OF GE Ce BC Fe Of DL Ct BO FO OO 


mod sodd 


 ® 


° 
o 
Lad 
~ 
io 
oo 
_ 
nm 
_ 
o> 
n 
o 
ad 
i 
hed 
o 
Ned 
np 
bed 
> 
> 
o 
= 
> 
ae 
& 
badd 
nm 
on 
o> 
baad 
o 
bad 
_ 
2 
2 


BAVYSBAVYLIA 
jo suoneindod Apnjs 94) ul souanbaay OH AUN SUIoidop sz1vy9 Iq 2 97 IANS] 


4.1.2.2 Neighbour Joining tree (NJ) 


The evolutionary relationships of the populations were studied by NJ tree 
based on Fst and Rst distances (Fig 17a, 17b). In Fst based Nei’s tree, the five caste 
populations studied clustered together with minimal distances between each of them, 
while the tribal populations clustered distinctly with higher distances from each other. 
Interestingly the two Gond populations clustered together. Similar picture was 
obtained with Rst based tree as well, except for the order of the caste population in the 
tree. Further in both the trees, Kolam, the Central Dravidian speaking tribe of 
Maharashtra was the closest to Warli and other tribes. Nonetheless the two Gonds, 
other Dravidian speaking populations were quite distant. Further the AA speaking 
Korku showed the greatest distance from Kolam and other tribes studied. 


4.1.2.3 AMOVA 


AMOVA values were obtained by grouping the study populations based on 
various parameters. The grouping based on three distinct geographical regions 
(Sahyadri, Gondwana, Satpura ranges) and languages gave a higher Fct (0.145, 0.116) 
and lower Fsc (0.104, 0.107) values for YSNP AMOVA (Table 10). Whereas Fct was 
(0.112, 0.092) and Fsc was (0.059, 0.054) for YSTR. Other groupings based on caste 
and tribe divide, and subsistence did not yield any appreciable differences. 

The matrix of Fst/Rst pair-wise distance among the study populations from 
Maharashtra is shown in Table 11. It was observed that the Central Dravidian (CDR) 
populations such as Gonds and Kolam are also genetically distant from each other. 
Korku, an AA speaking population is stands distinct in comparison to the others. 


4.1.2.4 Principal Component Analysis: 


The first two principal components in different populations revealed region 


based clustering (Fig 18a, 18b). PC1 and PC2 contribute 39.4% and 31.2% of the 


56 


Figure 17a: NJ tree based on NRY HG- Fst distances for Maharashtra study popula- 
tions 


0.0 uy, aratha(43) 
00BrahminChitpavan(28) 
oes*BrahminDeshastha(1 2) 


0.02687 Parsee(86) 


0.02510 


0.01959 


Kokani(11) 
RajGond(12) 
*°C’Gond(37) 
Mang(22) 

Katkari(56) 

Warli(44) 


0.06681 


0.02131 


Kolam(27) 


0.007P3 0.25355 Korku(40) 


== 
nn 


Figure 17b: NJ tree based on YSTR- Rst distances for Maharashtra study populations 


Brahmin Deshastha(12) 


0.02789 Pansestse) 


0.007 
PRatkari(56) 
0.04291 Warli(44) 


Kokani(11) 


RajGond(12) 


01189 


0.02522 


0.10073 


Gond(37) 


Kolam(27) 


— Korku(40) 


0.02 


Note: The number within the brackets indicate the sample size (N) studied for each population. 
The branch lengths indicate the genetic distances between internal nodes. Castes form a tight 
cluster while the tribes form a loose cluster different from the castes. 


viIndyes pue eueMpUOy ‘TIpedyes ore suoIsay *¢ 
A[Tuaey OSeNSUL] ULIPIALIG AC 
Ajtuey oSensury] onvIsy Osny :VV 

Ajtuey oSensury] uesdoinq-opuy] :q] ‘Z 


SIoSeIOJ POO} ‘UesHAe ‘poyepor UTUUYeIG ‘SIOWLIL} PUL] 19\\ :POpN[oUT sdua\sIsqns UO paseq sdnoIyH *T 


:10N 


JOON 


suonendod Apnjs vayyservyey 94} JO SUICNOASQNS SNOLIvA 94} JO VWAOQIAV P2Sk¥q AUN 201 AGL 


XICU ISY WLS A Oy} syuosoidas opSuery 1oddn pue x~yeuwl HH AYN 94} sjuosoidal opsuery IoMo'T 
2930N 


WS 8c 0——CicISO = 600s LEO §=—LOE'DO-—s—s LEO = COW HBEO «=SLED = OBEO «= eeEO |= LEV OO Oy 
Ai = =—F| (ss «PAO S00 | SE00 | 9010 = 8110 Zs = = LON 
Tcc0 9910 FO oct.0)=S ISTO) = S810 = EIT'0— STO STO BIEO = 8870s eve Os) 
SAeekoe S100- | [ARSON OR Se ARS A SA co =| puOt fey 
a 00 | | ceo | P10 | e000 | sooo | tooo | eevo | e900 | cioo [ROH 
caraie COO PSK 8100 | = | P00 | S600 NOOK | TRA 
ra 8600 Ara €c00 | roo | 1600 | 400  soro crlo Ir 0 [BD 
ral 0 AKesaa 1700 | 0co0 | C000 | AAO sue 
Wall See 0CO0 (AION 86005] = | 300°0- | S100 RM ZV00) 8 = emern 
ae ewe eo | evo | coo | coo [sro [eco AD eco | 
voc'o RSMO occo §=6slco aD | £700 | 8100 | 1100 | 4o00- | 

1S7'0 1070 ~1'0 | 0L00 rood 0800 FLO | $900 [FSO ee 
Sa 00 ROR coo | cio | tio | teoo | e100 | revo | tooo [oro | [eases omer 


urmryeag I 


soouejsip ISy poseq WLS AUN 


z 
2 
a 
w 
c. 
5 
g 


suonelndod Apnjs vajyyseseye yA] 10} S9IULISIP ISY WLSA PU ISA OH AYN UO posed XLHVVUI 9dULISIP ISIMAIVY 2[] IQ" 


Figure 18a: Principal Component Analysis of NRY HG frequencies of study popula- 
tions from Maharasthra 


-30 0 30 60 90 


PC1 
Figure 18b : Scree plot for PCA (Figure 18a) components 


no 


— = by Ny Ww 
nao 


So 


Variance % 


ow 


a 
‘= 
a 


a 
v 
a 


Principal Components 


PC4 === 


PCS 


PC1 

PC6 B 
PC7 i 
PC8 | 
PC9 | 
PC10 } 


Note: The populations are coloured based on their language affinities. Squares indicate tribal 
and circles indicate caste populations. The biplot shows the contribution of each haplogroup 
represented by lines as component loading vectors. The percentage variance contributed by 
each HG is represented in the scree plot 


variance respectively and were determined by HG Rlala-M17 and HG J2a*-M410 
vector respectively. The IE tribal and non tribal populations got differentiated in this 
vector. The third dimension was determined by HG H1la*-M82 vector, differentiating 
CDR speaking Gonds and Raj Gonds from others. HG O2a-M95 in the AA speaking 
Korku formed the next dimension, with 7% variation. Thus overall the populations 
clustered based on their language family. 


4.1.2.5 Multidimensional Scaling 


Non metric-MDS computed from the Rst matrix is presented in Fig 19. Stress 
value was 7.86 and R” value was 0.93. Similar to PCA, the Gonds and Korku were 
isolated that could be attributed to their distinct YSTR profiles which separated them 
in different directions from rest of the populations. Similarly, Korku, an AA speaking 
population was also isolated from the others. 
4.1.2.6 Phlogenetic networks: 

NRY HG C5-M356: The haplotypes of this HG were mainly present in IE speaking 
population, Warli. The network showed a central reticulation, long branches and 
multiple unoccupied steps indicating distant source of the haplotypes among these 
populations or loss of haplotypes by genetic drift. 

NRY HG H1la*-M82: The Dravidian speaking, Gond and Raj Gond were over- 
represented in the network. The branches were strewn from a hypothetical central 
node. Caste populations occurred sporadically. All these features indicated the less 
diverse sources and long term expansion among these populations (Fig 20a). 

NRY HG H2-Apt: This HG was mainly localised in IE speaking, artisan group- 
Katkari. Dhangar, an IE speaking pastoral population showed population specific 


cluster. Long branches with single step mutations at the periphery of the network 


57 


Figure 19: Multi Dimensional Scaling of NRY-STR —Rst distances for populations of 
Maharashtra 


“ | 
on 
0.154 = 

0.10 

0.054 
> Kok 
<s 
2 
& 
= Kor 
A 0.004 Kola 

Man ward | 
Kat 
Dh 
*O Bdes 
Mart 
-0.054 
Bchi 
oO 
Par 
-0.104 fo) 
-0.10 0.00 0.10 0.20 


Dimension 1 


Note: The populations are coloured based on their language affinities. Squares indicate tribal 
and circles indicate caste populations. 


Figure 20a-20e: Phylogenetic network analysis of Maharashtra study populations: HGs 


Hla*, J2a*, L1, 02a, Rlala 
Figure 20a: NRY HG H1a*-Mg2 


Brahmin Deshastha Me 
Hl Dhangar ; 
BB Gong 

[Raj Gond 

BB Katkari 

BB Kokani 

BB Kolam 

BB Korku 

i Mang 

i Maratha 

OD Parsee 


B wari 


Figure 20b: NRY HG J2a*-M410 
Brahmin Chitpavan 

I Brahmin Deshastha 

oO Dhangar 

BB Kokani 

HB Mang 

HB Maratha 

DD Parsee 


i wari 


indicated only recent YSTR based evolution among these populations from 
multiple/diverse sources. 

NRY HG J2a*-M410: The haplotypes were mainly present in the Parsee population. 
However Brahmin populations also showed some population specific cluster. The 
presence of long branches and step wise mutations at the periphery, indicated recent 
evolution of these haplotypes in the study populations (Fig 20b) 

NRY HG L1-M27/76: The haplotypes of this HG were sporadically distributed 
among the populations of Maharashtra. No population specific clusters were found. 
The central node was occupied by Central Dravidian speaking population, Kolam. 
The haplotypes showed long radiating branches with multiple un-occupied steps 
indicating distant sources of its haplotypes (Fig 20c). 

NRY HG O2a-M95: This HG was localised mainly among the AA speaking Korku. 
This population showed single step YSTR evolution and also occupied the central 
node. The YSTRs of this HG was not shared by other populations. All these indicated 
long term isolation and evolution of this HG among Korku (Fig 20d). 

NRY HG Rlala-M17: This network showed two distinct clusters (Fig 20e). Cluster 
1 was mainly composed of populations such as Brahmins, Dhangar, Marathas and 
Parsees. Cluster 2 was mainly composed of Dhangar, Marathas and CDR speaking 
tribal populations. The Katkaris shared all the 17YSTRs within the population, 
indicating recent single source of this HG in this population. Cluster 2 did not possess 
any of Brahmin populations in comparison to cluster 1 which could indicate a 
different YSTR evolutionary pattern within this HG among Brahmin populations. 
NRY HG R2-M124: This HG was distributed mainly among the IE speakers of 


Maharashtra. Parsee showed population specific cluster. Long radiating branches in 


58 


Figure 20c: HG L1-M27/76 


Brahmin Chitpavan 
BB Dhangar 

BB Gon 

oO Katkari 

BB Kolam 

BB korku 

BB Mang 

BB Maratha 


Bi wari 


Figure 20d: HG O2a-M95 


wem i 
2esieg [] 
eyeey ij 

Suen 

muoy fi 

wejoy fj 

wuexoy fj 

ueyey fj 

puoo ley fj 

pucp fj 

Jebueyg fj 
euyseyseg uuyes fj 
ueredyy) umyesg [i 


‘ 31 
LIW-8 TPT OH 2907 ONS 


the network indicated genetic drift or distance source for the YSTRs among the 
populations. 
4.1.2.7 Mismatch distributions: 

Mismatch distribution analysis was performed for all the YSTRs within a HG 
for all the study populations (Fig 21 a — h). HGs C5-M356 showed multi modal peaks 
that could indicate multiple sources of this HG or loss of haplotypes due to genetic 
drift. HGs Hla*-M82 showed unimodal peak and high MPD value of 7.2. HG Rlala- 
M17 showed two peaks, with one having higher frequency than the other. This can be 
correlated to the two clusters of Rlala-M17 haplotypes in the network analysis. This 
plot with a high MPD of 8.9 indicated signatures of demographic expansions probably 
from two distinct sources. HG H2-Apt however showed a bimodal peak with high 
MPD value (12.635), indicating at least two different sources for this HG. HGs J2a*- 
M410, L1-M27/76, O2a-M95 and R2-M124 showed multi modal peaks, suggesting 
multiple sources for these HGs among the study populations of Maharashtra. 
4.1.2.8 BATWING Analysis and ASD estimates: 

The phylogenetic tree computed using BATWING (Fig 22) showed three 
distinct evolutionary groups. The clustering obtained by this analsysis was similar to 
that obtained from PCA and MDS (Fig 18, 19). Brahmin Deshastha and warrior 
Maratha showed a split time of 1,243 years. Brahmin Chitpavan had a recent split 
time with these population 1,019 years ago and the CDR speaking Kolam shared 
ancestry with these populations ~4Kya. The second branch was comprised of IE 
speaking tribes and AA speaking tribe separated ~10 Kya. The DR speaking Gond 
and Raj Gonds has a recent split time of 3,016 years ago. The Parsee, originally from 
Iran was stubbed to the common ancestor (11.5 Kya) of these two branches showing 


they are distinct from Brahmins and all other populations of Maharashtra. 


59 


Figure 21a-21h: Mismatch Distribution plots based on YSTRs within a HG 
for study populations from Maharashtra 


Reunive frequency 


HG C5 


oma 7 g MPD11.4 


one 


our 


cores 


HG Hla* 
MPD 7.292 


TC 
| - HGH? 
; . [2 MPD 12.635 


T r HG J2a 


| ~2- fe MPD 12.819 


° il 
° 


HG L1 
MPD 7.359 


meat 


% 
o 
o 
ee 
« 
. 4 
+ ” 
q 
2 
| 
° ‘ 2 3 ‘ s ‘ ’ ‘ 


” ” 
| | 
| | | 
| | 
| 
s ‘ , ‘ *_ * © @ 8 


HG O2a 
MPD 5.917 


« 
l | 
2) 
” 
‘ 
li : 
_ * © 2 om 


» 
| il l 
‘ 2 2 ‘ 


“1 » HG Rlala 
MPD 8.932 


oon 


HG R2 
MPD 8.149 


vAy Ul Sose jUNSI1dd1 SY}ISUI] YIULAG JY J, :93}0N 


S9STT 


SOLS 
puopfey 
O10E SeLP 
6897 
910E 


HON LzIL 
Z198 


Osst 


THe re S9LT 978T 
Tret 
607E 
£677 


bror 


S6EZ SLzz 


ueaedryjunuyerg — SPOL 000T 
—_—_—_—— 


RUpPeIejA] eet 
£eeT 
6t0L 


eyseysoqurunetg 
er7L 


BAVYseivyeyA JO suonepndod 94} 10; 99.49 daUas0[AYd ONIM LV 277 OAS 


The ancestral effective population size of the Maharashtra study populations 
was found to be 6,310 (95% CI: 5,882-6,640). The TMRCA was 57,868 Ybp (95% 
CI: 55,519-62,678), and the population expansion time was 47,921Ubp (95% CI: 44, 
899-49,656). The TMRCA (Table 8) and ancestral effective population size of caste 
population was 64,756Ybp (95%CI: 42,198-1, 08,638) and 1,565 (95% CI: 747- 
4,114) respectively. Whereas the TMRCA and effective population size of tribal 
population was 60,326Ybp (95 % CI: 3,09,088-1,01,847) and 1,855 (95% CI: 581- 
4,618). The TMRCA of tribes overlapped with that of caste. 

The ASD age estimates calculated for each HG in every population gave an 
interesting picture on the histories of the populations (Appendix 11). The ASD age for 
HG Hla*-M82 was uniform among the populations, which coupled with the diverse 
clusters observed in the network (Fig 20a) suggest that the source of this HG among 
the Mang, Gond, Katkari and Warli was unique. On the other hand Rlala-M17 
showed a diverse age estimates among the populations. The Brahmin populations 
showed the highest age of 27.3+11.4Kya followed by Dhangar (19.4+3.5). Through 
the network analysis (Fig 20e) showed two distinct clusters, most of the samples 
excepting Brahmins and Parsee were distributed in both the clusters. The wide range 
of age estimates of Rlala-M17 among these populations suggests a diverse source of 
this HG among them. 

The statistical analysis of Maharashtra populations thereby supports the view 
that geography and languages have co-influenced the genetic structuring of these 


populations. 


60 


Results 4.1.3 Karnataka - The land of mid-Western Ghats 

The regions of sampling in Karnataka are shown in Fig 5. A total of 877 
volunteers belonging to 3 tribes and 10 caste populations were studied for their YSNP 
and YSTR polymorphisms (Table 2). Attention was paid to South Canara region for 
the reasons adjoining to Nilgiris, all belonging to Western Ghats: the moist deciduous 
forests. The North Karnataka was purposefully avoided for the reason of later 
invasion and expansions such as Vijayanagar Empire. The ethnographic details of 
populations studied are given in Appendix 1d. 
4.1.3.1 Y Haplogroup frequency distribution 

When all the samples from Karnataka were analyzed for NRY HGs, Rlala- 
M17 and HG Hla*-M82 were present in a frequency > 20% (24.1%, 21.8% 
respectively). HGs such as R2-M124, HG L1-M27/M76, F*-M89 and J2a*-M410 
were present in the range of 10% to 5%. Whereas HGs C5-M356, H*-M69, H2-Apt, 
L3*-M357, J2a4c-M68, J2b-M221/M102, Qla3-M346 were present sporadically in 
the range of 3-1%. (Table 3, 4c, Fig 23). 

When individual populations were considered, F*-M89 was present in the 
frequency of 23.4- 2% in all populations except Brahmins and a tribal population, 
Koraga. HG Hla*-M82 was present in all the populations with the highest frequency 
in Koraga (89%, FET 2.E-38). HG J2a*-M410 was over-represented in Iyengars 
(28.6%, FET 1.E-06). HG L1-M27/76 was also present in all the populations with 
high frequency in Yerava tribe (17.2%, FET 2.E-02) that subsists on wet/dry land 
farming. HG L3*-M357 was found to be highest in Brahmin Havyaka (14.8%, FET 
2.E-06) among the study populations of Karnataka. HG Rlala-M17 was ubiquitous in 


all the populations with highest frequency in Brahimin Goud Saraswath (67%, FET 


61 


ByeVUILY JO IPIs JSvd OY} SpIeMO} UDdds ATUTLU SBA GRIA-xA OH 

INg "PodAlosgoO oe SOH AYN JO uoNNqisIp Jo usoyed o1y1d0ds ON *Z 9]qQe} UI payuosoid ore porpnjs 
sojduies Jo Joquinu oy] ‘Aoy OU} Ul pojeoIpuT sno]Od yUdIAZIp Aq poyudsoidos Ie SOH AYN SNowea 
dL “UOTD9][O9 o[duIes JO voIe WO] (Sd}BUIPIOOD) SUOTSAI dy} UI posed ose syIeYd oId OY] :3]ION 


OL 6 8 Z 9 s v € é L 0 


gf 89 ¥9 O98 YS ZS Br wy Ov YE Ze Bz FZ OZ OF Zt BO vO OD 


BYR 
“buBiey Jo suoneindod 94) ul sarsuanboay OH AYN Suyodidap sjyieyd aig :¢7 andy 


4.E-21). Havyaka Brahmin also represented the higher frequency of R2-M124 
(39.8%, FET 9.E-15). 

When the genetic diversity of the tribal and caste populations were compared, 
caste populations possessed significantly higher diversity (0.8578 +/- 0.0075) 
compared to tribes studied (0.7499+/-0.0314) with a p value <0.0001. 


4.1.3.2 Neighbour Joining Tree: 


The evolutionary histories of study populations were inferred from NJ trees 
based on NRY HG-Fst and YSTR based Rst distances (Fig.24a, 24b). The Brahmin 
Havyaka was genetically distanced from other Brahmin cluster. The wet/dry land 
farming populations (Bunts, Mogaveera, Gowda) shared minimal distances in both the 
NJ trees. Koraga, an artisan tribal population was isolated from the other populations. 


4.1.3.3 AMOVA 


The 13 study populations from Karnataka were grouped based on various 
parameters such as caste tribe divide, geography, regions (Karavali, Malenadu and 
South Bayaluseme), subsistence (Brahmins, wet/dry land farming, artisan and food 
foragers), language families (Dravidian and Indo European) and language dialects 
(Kannada, Tulu, Sanskrit, Kodava takk and Koraga) and Analysis of Molecular 
Variance was calculated (Table 12). It was observed that the variance among groups 
(Fct) values were lower compared to the variance among populations within groups 
(Fsc) values in all methods of grouping. If the variance among populations (Fst) 
values based on SNP is greater than that of Fst of STR that could be attributed to 
lineage specific gene flow into the populations. In the study populations, Fst values of 
SNP were found to be greater than Fst of YSTR. The AMOVA results suggested that 
genetic variations between various populations that were grouped based on social 


characteristics, language or geographical affiliations were not appreciable and 


62 


Figure 24a: NJ tree based on NRY HG- Fst distances for Karnataka study 
populations 
lyengar(42) 


0.11245 


BrahminGoudsaraswath(94) 


902182] K odava(50) 
Bunts(74) 
; Mogaveera(88) 
°X’Gowda(81) 
0.00097 
Adikamataka(64) 


0.05827 


Jenukuruba(26) 


1 Yerava(64) 
0.07459 


0.07732 


BrahminHawaka(88) 


Billava(58) 


0.22512 


Koraga(73) 


0.05308 


Kuruba(75) 


‘02 
Figure 24b: NJ tree based on NRY STR- Rst distances for Karnataka study 


populations 


0.03550 


lyengar(42) 


BrahminGoudsaraswath(94) 


Kodava(50) 
Bunts(74) 
Mogaveera(88) 

Billava(58) 

Gowda(81) 

— BrahminHawaka(88) 


0.10322 


0.01343 


0.03740 


Jenukuruba(26) 


0.02423 


Yerava(64) 


Adikamataka(64) 


0.14178 


Koraga(73) 


0.02 


Note: The number within the brackets ( ) inidcate the sample size for each population. 
The branch length indicate the genetic distance between internal nodes. No definite 
-grouping pattern based on language, caste-tribe divide, geography or other social 
characters was obsereved. Only wet/dry land farming populations (Kodava, Bunts and 
Gowda) form a cluster in both Fst and Rst based NJ trees 


‘(suroyed o8eLueul) sonorsos TeyoreLyed ore surmyerg ‘eyeyeusey Ipy 
‘eqniny ‘epMOD *AJOINOS [eYOIeLCU Ie BBLIOY “BIDDARSO/| “BALTITG ‘syuNg se YOns suoNendog ‘*¢ 
eselOY ‘NIN, Wye} eavpoy ‘epeuury :sjooerp osensue’_ “yp 
suonendod poyeyor urmyelg ‘Uestye ‘oINy[NOLISe puR] 19 ‘SIOSLIOJ POO :ddUd}SISqnS *¢ 
oulsosnyeAeg NOS ‘npeuse/ ‘TeALILY, POPUL suOISaI [eoTydeIsOOyH ‘7 
A[Tuaey OSeNSUL] ULIPIAPIG AC 
Aydess008 I19y} Jo satOadsout sioyeods (yLrysues) 
AI 9g 0} poropisuod oJ9M suoTye[ndod poyepor urYeIg [TV “A[Tuey osensury uUeodomy-opuy] :qI ‘1 


:0J0N 


a SOR i A 


| Lro'o | Leto | sooo | soo | Tso | oero | @ [| AHL /aIse) 
| fT oro | | CC eT uoneindod Irv 


Isq |sdnory ssurdnoasqns 
JO ON 


suonendod Apnjs vyejyeusey oy) JO SuIdnoASQns sNoLivA 94} JO VWAOINV Peseq AYN °ZI PBL 


confined to intra-populations variation only. This observation is in contrast to that 
observed in Gujarat and Maharashtra. 

Table 13 shows the Fst and Rst based genetic distances. The Koraga and 
Jenukuruba were genetically distinct from many other populations showing highest 
distance with Goudsaraswawth, Iyengar and Kodava. 


4.1.3.4 Principal Component Analysis 


Fig. 25 shows the PCA plot. In the PCA, PC1 explained 53.4% of the total 
variance mainly contributed by NRY-HG Rlala-M17. Brahmin populations such as 
Goud Saraswath (BGS) and Iyengar (Iyn_K) were differentiated by this vector. PC2 
contributed 20.9% (NRY HG H*-M69) which differentiated Jenu Kuruba, a tribal 
population that subsist on honey collection. Yerava and Adikarnataka populations 
were differentiated by F*-M89 vector. The percentage variances contributed by each 
PC component are shown in the Scree plot (Fig. 26b). 

The NM MDS plot, computed based on Rst distances gave a stress value of 
6.31 and R’ = 0.95 (Fig. 26). The MDs plot obtained was similar to the PCA in their 
clustering pattern. 


4.1.3.5 Phlogenetic networks: 


NRY HG C5-M356: The haplotypes of this HG were seen mainly in Yerava, a tribal 
population and Brahmin Havyaka with no central median node. Brahmin Havyaka 
formed a population specific cluster with no haplotype sharing with other populations. 
The branches within this cluster showed mutations that were one to two steps away. 
The other populations showed long radiating branches with several unoccupied steps, 
indicating distant source of the haplotypes among them or loss of haplotypes by 


genetic drift. 


63 


XLYCU ISY WLSA 9y} syuosoidal opsuew Joddn pue x~yeul HH AYN 9U} syussoidal opsuewy IoMo'T 
0]0N, 


Lo MM isso uzvo)«=Cecvo ~=—tcvo WBeeo"! poco e0"| poe §=szr0 §©6L09'0 ~—sor'0 | 


060°0 r0E'0 eqninyy 


LS1'0 ‘0 | 660" } zoo | zoo | oro | £900 €97'0 eyxeyeURLpY 
Wea 800 | zro'o | revo | sro | ssoo | reo0 | era BABIO K 
IS ESE‘ eqnmynues 
I1€0  €L0°0 2 070 ura eypAaey UTUNeIg 
WAN SZ0'0 | 9200 | Lbo0 |__| tzo0 | zoor- | sooo | cco | zo [oe BART 


$870 s900 | S100 | rso10 | rio0 | rr00 | zooo-| —| ovo | v1 | roo | sjung 
ARCATA CRUST) One| oau| $900 | ovo | zero | | wpemserespnoy urumyerg 


. 
eyeye eqn | yedaryy [ae] 
K 
fos rts a eo on pos] sens 
Turyerg 


SOOURISIP ISY poseq WLS AUN 


saouejsip 1S OH AUN 


suoyeindod Apnjs vye}UuIey 10J S9dULISIP ISY WLSA PUL ISA OH AYN UO poseq XLYVUI 9dULISIP OSIMAIV 3] WQVL 


Figure 25a: Principal Component Analysis of NRY HG frequencies of study popula- 
tions from Karnataka 


GA‘ tisan 


Hl Food Foragers 
© Brahmin related 


@ Wet/Dry land faming 


PC2 


PC1 


Figure 25b: Scree plot for PCA (Figure 25a) components 


lees 


PCl PC2 PC3 PC4 PCS PC6 PC7 PCS PCI 


$s 8 8 


J 
—) 


Variance % 


— 
c—} 


—) 


Principal Components 


Note: In the PCA, squares represent tribal and circles represent caste. The populations are 
coloured based on their mode of subsistence. The biplot shows the directionality of loading 
PC components. The percent variance contributed by each PC component is given by the 


scree plot. It is to be noted that most of the wet/dry land farmers are clustere together at the 
center of the PCA 


Figure 26: Multi Dimensional Scaling of NRY-STR —Rst distances for populations of 
Karnataka 


0.20 


0.00 


Dimension 2 


-0.20 
-0.20 -0.10 0.00 0.10 0.20 
Dimension 1 : 
Artisan 
Hl Food Foragers 


© Brahmin related 
@ Wet / Dry land faming 


Note: The plot showed the clustering of the wet land farmers at the center of the plot. The 
brahmin populations were diverse 


NRY HG F*-M89: Four populations showed distinct populations specific clusters: 
Adikarnataka, Yerava, Jenukuruba and Gowda. Adikarnataka formed a distinct cluster 
with single step mutations, thereby indicting YSTR evolution of this paragroup. 
Yerava were represented by long radiating branches. Gowda showed long branches 
with single step mutations towards periphery. Jenu Kuruba, a foraging tribal 
population was also represented by long branches with multiple unoccupied steps. 
Gene flow between Gowda and Adikarnataka populations was observed. All these 
indicate that these populations either experienced long term genetic drift with traces 
of recent evolution of these haplotypes in some populations with minimal recent gene 
flow among them (Fig. 27a). 

HG H*-M69: This network was characterised by three distinct clusters. One of the 
was mainly occupied by the food gatherers-Jenukuruba population. Whereas 
populations in other two cluster were sporadic. The network showed reticulations with 
no central node. These features indicated that these populations had different sources 
of this HG and have experienced long term drift. 

NRY HG H1la*-M82: This network was mainly represented by Koraga, an artisan 
population and a pastoral population-Kuruba. The network clearly shows two 
different directions in which the populations have expanded (Fig.27b). Haplotype 
sharing was observed among the Koraga and Iyengar populations. Also haplotype 
sharing was observed among Kodava, Billava, Adikarnataka and Koraga populations. 
This indicated gene flow/origin from a common ancestor of populations studied. 

NRY HG H2-Apt: Brahmin Goudsaraswath showed populations specific cluster and 
YSTR evolution among them. The other cluster with long radiating branches included 
sporadic representation of different populations, indicating diversification of YSTR 


among Brahmin Goudsaraswath and other populations of Karnataka. 


64 


Figure 27a-f: Phylogenetic network analysis of Karnataka study populations 


Figure 27a: NRY HG F*-M89 


[By Adikarnataka 
I bitava J ea pubs 


CD Bunts 

HB Gowda 

BB Jenukuruba 

HB Kodava 

HB Kuruba 

i Mogaveera Yerava 
BB Yerava ‘ ° 


3 
@-"2 


[i Adikarnataka @ ® 


Billava 
~ ore Ps o) 9 
Hl Brahmin Goudsaraswath “ ra) 
Brahmin Havyaka O @ 
LC Bunts ® 
HB Gowda Oo 7 @ 
Oo lyengar 6 @@: : 
@ g 5 ‘ O 
BB Jenukuruba ® , & a Co @ G0 0? ‘e 
BB Kodava o AINA 3 O 
24 
IB Koraga . a is @ ‘ 
HB kuruda ry a . A o 7 .% 4 
LD Mogaveer. @ oO ¢ @: 90 amity 4 @ 
ae <P @ *@ese 
BB Yerava 7 . 4 c° 
F 2 1 e # 1 a] 1 © . 1 
O 1 @' 
2 


NRY HG J2a*-M410: This network was found to have two distinct clusters of 
Brahmin Havyaka and Iyengar with step wise mutation. These indicate long term 
isolation and evolution of these YSTRs in the above mentioned populations. Billava 
also showed population specific cluster. No median haplotype was found and all the 
samples were in the periphery (Fig. 27c). 

NRY HG J2b-M221/M102: Hypothetical central nodes with long branches were 
noticed in this network. No population specific clusters were identified. The YSTRs 
could have had diverse origin for these populations within HG J2b. 

NRY HG L1-M27/76: The network was characterised by central median haplotype 
composed of Kuruba population. Long radial branches were strewn in all directions, 
radiating around the node with several unoccupied steps. Yerava formed a specific 
cluster. Haplotype sharing was observed. The lack of population specific clusters with 
limited evolution is suggestive of either gene flow among the populations or a recent 
origin from a diverse source (Fig. 27d). 

NRY HG L3*-M357: The haplotypes were over-represented in Brahmin Havyaka 
that showed a distinct population specific cluster with single step mutations in the 
branches indicating long term YSTR evolution within this HG. Gowda (N=5) did not 
show any YSTR evolution within this HG and indicated events of recent in-migration 
in them. 

NRY HG Rlala-M17: The network had a hypothetical median central node, with the 
branches radiating around this node. Brahmin Goudsaraswath was seen to be 
represented throughout the network, indicating high YSTR diversity of HG Rlala- 
M17 among them. Iyengars also showed population specific cluster. Minimal 


haplotype sharing was also observed among different populations (Fig 27e). 


65 


Figure 27c: NRY HG J2a*-M410 


[BD Adikarnataka 
i Bitava 

Brahmin Havyaka 
LC) Bunts 

BB Gowda 

i lyengar 
 Kodava 

BB kuruba 
 Mogaveera 

BB Yerava 


Iyengar 
e 
e * 


Figure 27d: NRY HG L1-M27/76 


[Bh Adikarnataka 
i Bitava 

Hi Brahmin Goudsaraswath 
Bl Brahmin Havyaka 
C) Bunts 

HB Gowda 

Bl 'yengar 

[Bl Kodava 

[Bl Koraga 

BB kuruba 

[ Mogaveera 

Bl Yerava 


Figure 27e: NRY HG Rlala-M17 


[By Adikarnataka 
BB Bitava 

Hl Brahmin Goudsaraswath 
Brahmin Havyaka 
CD Bunts 

r=) Gowda 

BB tyengar 

BB Jenukuruba 

BB Kodava 

BB Koraga 

BB kuruba 

( Mogaveera 

BB Yerava 


BB Adikarnataka 

Billava 

Hi Srahmin Goudsaraswath 

Brahmin Havyaka . 

BB cunts Brahmin oO 


Havyaka 
BB Gowda 


BD yengar ef @ 


Bl Kodava Se. Yerava 
BB Koraga 4 Ge. 
BB Kuruba 

[ Mogaveera 

Bi Yerava 


NRY HG R2-M124: This network is characterised by central reticulations and long 
radiating branches. Brahmin Havyaka and Yerava show population specific cluster 
with single step mutations among them. Haplotype sharing was observed among IE 
speaking Brahmin Havyaka and DR speaking Mogaveera. Kuruba showed distinct 
evolution among the YSTRs within this HG (Fig. 27f). 


4.1.3.6 Mismatch Distributions: 


Fig. 28a — 28k shows the mismatch distributions for each haplogroup. NRY 
HGs F*-M89 (MPD: 13.117), Hla*-M82 (MPD 6.7), HG L1 (MPD 7.7) and Rlala- 
M17 (MPD 7.8) showed a clear unimodal peak and high MPD indicating a long term 
demographic expansions. Although unimodal peaks were observed in these HGs, the 
highest MPD in F*-M89 is suggestive of a longer period of isolated evolution of this 
HG. HG H*-M69 (13.45) shows very high MPD values with multimodal peaks 
suggesting that these paragroups might be representing the unidentified markers. HG 
C5-M356 (MPD 12.7) showed multimodal peaks either indicating multiple sources of 
YSTRs or loss of YSTRs by drift. HG J2b-M221/102 (MPD: 8.9) and L3*-M357 
(MPD: 5.5) shows multiple peaks indicating more than one source of YSTRs. 


4.1.3.7 BATWING age estimates 


BATWING phylogenetic tree of the 13 populations studied showed a 
coalescence time ~9 Kya (Fig 29). The population split times were deeper for the 
populations of Karnataka (~7-4 Kya) as compared to Gujarat and Maharashtra. This 
indicated that the populations have been isolated for a longer time and are unique on 
its own. However Bunts and Billava had a recent split time of 2,241Ybp. The pattern 
of clustering in the phylogenetic tree computed by BATWING was different from NJ 
trees. The ancestral effective population size was found to be 21,727 (95% CTI: 


21,158-21,924) and TMRCA was found to be 80,240Ybp (95% CI: 79,486-90,809). 


66 


Figure 28a-28k: Mismatch Distribution analysis based on YSTRs within a NRY HG for 


Karanatka study populations 


cont 7 HG C5-M356 
| » MPD:12.712 


errs estresennouneeonanns 


Unweighind mean pairwise Gfiernace: 12712 


» HG F*-M89 
| MPD: 13.117 


Pr 2 FF CTENMOHNRHMNHEKTHHMKN NRHN HHT 


Unwveighted mean pairwise difference: 13.117 


haere even HG H*-M69 
ene 7 MPD 13.457 


@r2 3 46 CP CHS HMHNRHNHEKO EH OH B DD 


Unweighted mean pairwise difference: 13.457 


d 


Retative frequency 


HG H1la*-M82 
asso T 
a MPD: 6.797 
aum T oe 
206 
aun T 
ans T _ 
ves 
come T 
1 
ones 7 
aero T a 1216 
asses 
af re 
asesr 
- 
ans T ho} 
coun Tm = 
“0 
* 7 « 
“mw e 4 4 


e123 46 6 CPT eo HR HRM KET eH OH DB 


Unweighted mean pairwise difference: 6.797 


‘nar teeny HG H2-apt 
al | , MPD: 9.686 


om T 
ones T 


one 7 


Unwreighied moan pairwese cifference: 9.695 


HG J2a*-M410 
here tomer MPD: 11.441 


one 


or oT = 


eorrxae4cs 6 Feo HOHNRHRME HTH HHH DD HM BH 


Unweighted mean pairwise difference 11.441 


Betatree requercy 


some HG J2b-M221/102 _ HG Rlala- M17 
MPD: 8.960 MPD: 7.876 


aseeee arsres 
orcas over 
exer eta 

aren 
ovens 

9 vo003 
cams 

ooesze 
ocrtes 

oeries 
oan 

secre 
owes 

sews? 
eens? 

saves 
aonser one 


oF. FERC TEP HOH RHR TOON NH ABBR 


Deweryied mess permise cifterence: 7.78 
Unweghiod mean pairwnce aiflerence: 8.960 = 


h HG L1- M27/76 kK HG R2- M124 
MPD: 7.716 MPD: 8.753 


Relate frequency 


ensaas 
omen T enees 
onoes roe 
eresr ooare 
orses coesss 
owen corns 
éaeee ey 
ora cosmo 
cose — 
winiie 0.00645 
sie onus 
one er zs eS Tee onnenenTreon nnn sw W sm 
09-23 A BT 8 8 90 45 42 95 94 08 06 HF 98 Om 20 29 77-23-28 28 8 IF 87 30-997 TTD Unweighted mess parwise @llerence: &153 


Unweighted mean pairwise difference: 7.716 


HG L3- M357 
MPD: 5.510 


Rewtne treqaercy 


ensoe 


enena 


eur 


ern 


eons 


eansr 


scours 


oot 


‘Unweighted mean parwise difterence: 5.510 


PARIOX 


eqnmynuer 


PARTIC 


PARpoy 


BIBaARSOIAY 


WemserespnoHuunyerg 


PeBe10y, 


eyedarpuryelg 


exReUIeypy 


6089 

6089 

Tle 

Tr7z 
6LSP 
rz 
SELF 
T6T 
OST 
POor 
Tél 
voor 
seo 
SSPS 
9F0S OcoT 
LOLP oor 
6£E 
coor 
cP 

c99r 


Byejyeuiey Jo suoyelndod z0j 99.13 s0ued0[AYd poseq QNIM.LVA 267 9ANSIY 


eAy Ul sade yuasaidal sioquinN, 


1067 


667 


—— 


0001 


Whereas the population expansion times were 52,276Ybp (95% CI: 51,342-54,771 
(Table 8). It was interesting to note that the population expansion time was much 
smaller as compared to TMRCA with non-overlapping confidence interval 

The overall results therefore suggests that the populations of Karnataka showed no 
one to one correlation with language, geography or other social characters and have 


been isolated for at-least ~7Kya. 


67 


Results 4.1.4: Godavari Delta of Costal Andhra Pradesh — an ancient fertile 
river-fed settlement of Deccan. 

The Godavari delta of Andhra Pradesh was sampled for the study and the 
areas sampling is shown in Fig 1d Andhra Pradesh. Ethnographic notes are studied 
populations are presented in Table 1d. 11 caste and 2 tribal populations totalling 744 
male volunteers were studied for their NRY polymorphisms. All the samples 
belonged to west Godavari, east Godavari and coastal belts of northern Andhra 
Pradesh. Below Orissa this is the most fertile alluvial belt with rice cultivation and in 
the context of Pataliputra and Kalinga this region assumes significance in the trickling 
down of the populations along the east coast: hence this region was concentrated in 
the present study (Table 1). 


4.1.4.1 Y Haplogroup frequency distribution: 


An overall frequency of 26.7% of NRY-HG Hla*-M82 was the highest 
observed in the total population of Andhra Pradesh. NRY-YHGs RlalaM17 and L1- 
M27/76 were present in a frequency ~12%. Whereas HGs F*-M89, J2a*-M410, O2a- 
M95 and R2-M124 were present in frequency >5%. These HGs put together 
contributed to 79.5% of the overall frequency amongst these populations (table 3, 4d, 
Fig 30). 

However, the frequency distribution varied very widely among various caste 
and tribes. HG Hla*-M82 though ubiquitous among all the populations, was over 
represented in Relli (51.1%, FET 2.E-07) followed by Mala (46.3%, FET 6.E-05). 
NRY HG Rlala-M17 was present in high frequency in Brahmin Andhra-Neyogi and 
Vaidiki (ANV) (55%, FET 4E-06) as compared to other populations. Yadava, Raju, 
Kamma and Kapu show ~20% of NRY HG L1-M27/76 but were highly significant in 


Kamma and Kapu (23.1%, 24.4% FET 6E-04, 2E-04 respectively). HG F*-M89 was 


68 


Bye[op LIeARpeoy pur |[e}seod Woy ATUIeW pojdwies o19aM suolendod oy y 
‘Z 9[Qe] Ul poyuasoid aie polpnys sayduues Jo Jaquinu sy] ‘Ady OY) Ul payeoIpUI SsINO]Od JUSJayJIP Aq poyussoid 
dB SOH AYN SNOLLA dY_L “UONDI]][OO sayduues (SoyeUIPIOOS) SUOISOI oY} UI pooeyd ose syreYd aid oY :I0N 


ysope.ig 
Baiypuy jo suoneindod 34) ul saDUINbIAy OH AYN Suyodidap syivyd sig :O€ aansiy 


present in many populations in appreciable frequencies (~10%): Raju and Konda 
Reddy however showed high F*-M89 (31% FET 1E-07, 21%, FET 9.E-03). The two 
tribal populations studied Konda Reddy and Konda Kammara showed considerable 
proportion of NRY HG O2a-M95, 27% and 32.6%, (FET S5E-08, 8E-05 respectively). 
Interestingly, Dravida Brahmins showed 24.3 % of NRY HG G-M201 (FET 2E-09) 
and 27% of HG J2a*-M410 (FET 4E-06) 

Nei gene diversity for caste populations was almost the same in caste and 
tribes (0.8668+40.0073, 0.8498+0.0237). 
4.1.4.2 Neighbour Joining Tree 

NJ trees based on Fst and Rst distances represent the evolutionary 
relationships among the populations (Fig. 31a, 31b) The SC populations (ie., Mala, 
Madiga, Jalari and Relli) and the tribal populations (ie., Konda Reddy and Konda 
Kammara) showed distinct clusters. The wet/dry land farming based populations 
(Kamma, Kapu, Yadava) clustered well based both on their Fst and Rst distance 
matrix. Settibalija, another wet/dry land farming population was placed at the upper 
arm of the core cluster of Kamma, Kapu and Yadava. The two Brahmin populations 
studied, Dravida and ANV both from East Godhavari were quite distinct from each 
other and they clustered separately with wet/dry land farming populations in both Fst 
and Rst based trees. This was reflected in their HG composition as well, indicating 
different paternal histories. Raju, a warrior class stood distinct. The NJ trees obtained 
based on YSNP and YSTR distances were similar. 
4.1.4.3 AMOVA 

The AMOVA computations based on various grouping such as language 
families (IE and DR), caste tribe divide, geography and subsistence was performed 


(Table 14). High Fct and low Fsc values were obtained when the populations were 


69 


Figure 3la:NJ tree based on NRY HG- Fst distances for Andhra Pradesh study 
populaltions 


Mala(82) 


ooo? 


oome 


o.01014 


Relli(90) 


0.01505 


Madiga(36) 
Jalari(73) 
KondaReddy (37) 


0.03333 


KondaKammara(46) 


BrahminDravda(37) 


o.07251 


0} Settibalija(81) 


0.08479 


BrahminANV(20) 


0.00863 


(.0047p0 O"'Y adhava(33) 
°K amma(91) 


+—_H 
0.01 
Figure 31b: NJ tree based on NRY STR- Rst distances for Andhra Pradesh study 
populations 


0.02537 Reilli(90) 
0.01787 Jalari(73) 


0.00824 


0.01472 


— Mala(82) 
Madiga(36) 

KondaReddy (37) 

cow KondaKammara(46) 


0.01865 


0.01097 


Dont 


BrahminDravida(37) 


0.02965 


BrahminANV(20) 


0.005 
Note: The number within the bracket () indicate the sample size studied for each 
population. The branch lengths indicate the genetic distance between internal nodes. In 
both the trees, it is to be noted that the poulations are divided in to four groups: 
1. Mala, Relli, Jalari and Madiaga representing SC population 
2. Konda Reddy and Konda Kammara represent the tribal groups 
3. Raju, a warrior group of Andhra Pradesh 
4. Yadava, Kapu, Kamma represent the wet/dry land farmers 
5. The brahmin related groups that are distant from each other 


poyejor UIWYeIg pure JOLIE A 
SIOBVIOJ POO “(DS) 93889 porNpoyos ‘sIOUIK] puR] JOA\ SOpNyour sdnois poseq souajsIsqnsg *¢ 
SIQJOWIOTWITY UO paseq AYdeVIsOdIH *Z 

uonendod suryods qq] ue st Joy ‘“Aydesso0e0s may} JO sAOOdsoI SIoyeods (yLrysues) 

AI 9q 0} poroprsuod o19M suoT}e[ndod pojyeyor urUTYeIQ [TV “A[Twuey osensuvy] Ueodomy-opyy :qI ‘1 


6 


L000 00] 8600 00] €L00 | $s00 + 
| <100 | s90'0 | eso0 | 9¢0'0 | L600 | £90'0 a I) 


[AE 2 De 1 © vOREINGOS TLY. 


enfin [ae [oe [me [oe [Bae] sn 
JO ON 


suoynendod Apnjs ysopeig viypuy 24} Jo SuIdnoisqns snowed 34} JO VWAOIVV P2Stq AUN FI AVL 


grouped based on their subsistence for both YSNP and YSTR (Fst: 0.0572 and 
0.0327; Fsc 0.0236 and 0.0167 for YSNP and YSTR respectively). An Fst value 
based on this grouping was high (0.0795 and 0.0489 for YSNP and YSTR). Based on 
subsistence the populations were grouped as Brahmin related, wet/dry land farmers, 
SC populations, food foragers and artisans. Among population within group variance 
Fsc, values were smaller as compared to variation among groups (Fct) when the study 
populations were grouped based on caste-tribe divide, language and geography. 

Table 15 shows that the pairwise Fst and Rst distances of the study 
populations from Andhra Pradesh. It was observed that the genetic distance between 
the two Brahmin populations and also with other populations was high. Konda 
kammara, a tribal population is genetically at minimal distance with the Konda 
Reddy, another tribal population and SC community, Jalari. The wet/dry land related 
populations (Yadava, Settibalija, Kamma and Kapu) share minimal Fst and Rst 
distances in comparison to other populations. 
4.1.4.4 Principal Component Analysis 

Fig. 32a shows the PCA plot coloured based on their mode of subsistence. The 
PC1 and PC2 components contribute to 33.7% and 27.3% variance respectively. The 
SC (Jalari, Relli, Mala and Madiga) were differentiated across HG Hla*-M82 vector. 
Konda Kammara, a tribal population differentiates along HG O2a-M95. The Brahmin 
ANV were differentiated by Rlala-M17 vector. The Brahmin Dravida and the 
watrior group Raju are differentiated by HG G-M201 and F*-M89 vector 
respectively. The major wet/dry land framing groups (Kapu and Kamma) are 
differentiated across HG L1-M27/76 vector. Fig. 32b describes the scree plot 


representing the variances contributed by each PC component. 


70 


XLIJLU ISY WLS A OU} syuosoidas oSuewy soddn pue xwyewl OH AYN OU} sjuososdas o[SueLy IOMOT 
OJON, 


L100 60'0 oe! 


Soour}sIp S.J DH AUN 


Women ¢zro LLY‘0 ~ 9zr0 ae 0610 


SOOULISIP ISY poseq WLS AYN 


suoyeindod Apnyjs ysopeig viypuy 10} S9dUeISIP ISY WLSA PUL ISA OH AYN UO poseq xLHvUI 9dUv)SIP ISIMAIV 2ST BqQUL 


Figure 32a: Principal Component Analysis based on YHG frequencies of study 
populations from Andhra Pradesh 


PC2 


@ Wet/dry land farmers 
gy Tribes-Foragers 
@ SC 


O Brahmin related 
©O Warrior 


PC1 


Figure 32b: Scree plot for PCA (Figure 33a) components 


w 
°o 


n 
a 


) 
° 


w 


: f 
lexs. 


PC1 PC2 PC3 PC4 PCS PC6 PC7 PCB PCS PCLOPC11PC12 
Principal Components 


Proportion of Variaance % 
& 


Note: The tribal populations are indicated by squares and circles represent the caste popula- 
tions. The percent variance contributed by each PC is given by scree plot. The biplot showes 
the variance contributed by each HG represented by lines as component loading vectors. It is 
to be noted that the study populations of Andhra Pradesh are clusterd based on their mode of 
subsistence 


4.1.4.5 Multidimensional scaling: 


Fig. 33 shows the MDS plot which is coloured based on their mode of 
subsistence. Stress value obtained for three dimensional NM MDS was 10.61911. 
This plot showed clear distinction of wet/dry land farming groups, SC populations, 
Brahmins, Warrior and tribal related groups. The clustering pattern for wet/dry land 
farming populations was similar to that of PCA. The SC populations like the Relli and 
Jalari were distanced to other SC populations-Mala and Madiga. This scenario could 
arise when the population undergo long term isolation and YSTR evolution or 
admixing of migrants groups with similar HGs but different YSTR signatures. This 
showed that these populations had similar NRY HG profile but distinct YSTR 
signatures. The Brahmins and Raju populations were distanced with respect to each 
other and to other populations when compared to PCA. 
4.1.4.6 Phylogenetic Networks 
NRY HG C5-M356: This HG was sporadically present in the study populations. No 
population specific clusters were found. The network was characterised by long 
radiating branches, indicating distant source and limited YSTR evolution or loss of 
haplotypes due to drift. 

NRY HG F*-M89: This network (Fig 34a) was characterised by the presence of 
warrior class- Raju, at the middle of the network. They also showed population 
specific cluster with single step mutations indicating isolation and YSTR evolution 
among them. The other study populations did not show such patterns of YSTR 
evolution and probably had different sources of the haplotypes 

NRY HG H1a*-M82: This HG was represented by majority of the study populations. 
The network showed a hypothetical central node with radial branches (Fig 34b). 


Population specific clusters were observed among Relli (SC) and Jalari (Fishermen, 


71 


Figure 33: Multi Dimensional Scaling of NRY-STR —Rst distances for populations of 
Andhra Pradesh 


BANV 
0.10 Oo 
0.05 
KonR 
BDr 
fe a Konk Yad Kap 
5 @ 
ea Kam __ Sett 
= 0.00 
= 
Aa 
Mad 
-0.05 Raju 
e@ oO 
Mal 
Rell 
0.10 
-0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 
Dimension 1 
@ Wet/dry land farming 
gy Tribes-Foragers 
@ SC 
O Brahmin related 
@O Warrior 


Note: The tribal populations are indicated by squares and caste populations are indicated by 
circles. The YSTR based proximity of the populations on their mode of subsistence is indi- 
cated by the MDS plot. Tight cluster were formed among the wet/dry land farmers and 
tribal populations. The brahmin groups were diverse. The SC population were distributed 
widely as similar to PCA 


Figure 34a-34e: Phylogenetic networks for HGs F*-M89, Hl1a*-M82, L1-M27/76, O2a- 
M95, Rlala-M17 


Figure 34a: NRY HG F*-M89 


GD Brahmin Dravida 
(J Kamma 

BH Kapu 
Konda Kammara 
[Bh Konda Reddy 
BB madiga 

BB mata 


Figure 34b: NRY HG H1la*-M82 


Dl Brahmin Dravida @ 
HB Brahmin Neyogi 

HB Brahmin Vaidiki 

 Jatari 

 Kamma 

HB Kapu @ 

Hl Konda Kammara 5 @ 
BB Konda Reddy Py 

BB Madiga ® 
BB mala 

BH Ray 

BB Rell 

BD Settibaija 

BB Yadhava 


SC). Haplotype sharing was observed among one sample each of Mala and Madiga 
(both SC); two samples of Raju (warrior) and Relli (SC) and one sample each of Mala 
and Yadava. This network describes evidences for gene flow among populations. 
NRY HG J2a*-M410: This network showed long branches with multiple unoccupied 
steps without central median node. This HG was over represented in warrior class- 
Raju with YSTR evolution among them. Brahmin Dravida also showed high 
representation, but unlike Raju they did not show isolated YSTR evolution. They 
probably had distant source for these haplotypes. 

NRY HG J2b-M221/102: The haplotypes of this network did not show any 
population specific cluster. However it was over represented in SC populations (Mala 
and Madiga) and wet/dry land farmers (Kamma and Settibalija). No haplotype sharing 
was observed among these populations, indicating that there could be different 
sources for this HG. 

HG L1-M27/M76: This network was characterised by a hypothetical central median 
node, with branches radiating around them (Fig.34c). The network was mainly 
composed of wet/dry land farmers (Kapu and Kamma). Kapu showed shorter branch 
lengths and step wise mutations indicating long term isolation and evolution of their 
YSTRs. Kamma were represented by long branches with multiple unoccupied steps 
indicating different source for their haplotypes within HG Llor genetic drift. Raju, 
formed a population specific cluster with YSTR evolution and minimal sharing of 
their haplotypes, indicating evolution of their YSTRs. 

NRY HG O2a-M95: Three distinct clusters were identified in the network: tribal 
populations- Konda kammara, konda Reddy and SC population- Jalari. There was no 


YSTR sharing among these populations, showing evolution in different directions. All 


7 


Figure 34c: NRY HG L1-M27/76 


Brahmin Neyogi 
HB Brahmin Vaidiki 
BB Kamma 

BB Kapu 

[Bh Konda Reddy 
BB madiga 

BB Mala 

BH Ray 

BB Rei 

Ll Settivatia 

BB Vachava 


Figure 34d: NRY HG O2a-M95 


BB Jatari 


OO kamma 


WB Kapu 


[By Konda Kammara O 
[BB Konda Reddy Konda 
$ 
i madiga ) kammita ® 
BB Mata Konda Reddy O 2 
. 4 

oO Relli @ F 1G 
DD Settibalia oe * 

@ * 6 


* 


these features indicated that there could be different sources of haplotypes for these 
populations (Fig.34d). 

NRY HG Rlala-M17: The central node of this network was occupied by Kapu, a 
wet/dry land farming population (Fig 34e). Kamma, another wet/dry land farming 
population formed a population specific cluster. Linear branches with multiple steps 
of evolution along the network showed YSTR evolution and spread of this HG. 

NRY HG R2-M124: The haplotypes of HG R2 were spread across all the study 
populations. No population specific clusters were identified. Majority of the 
populations showed long radial branching with multiple unoccupied steps from a 
hypothetical central node. 


4.1.4.7 Mismatch Distributions 


Mismatch distribution analysis of NRY HGs F*-M89 (MPD:11) showed 
bimodal peaks and high MPD values indicating that there may be unidentified 
haplogroups within this paragroup (Fig 35a-35j). These populations have atleast two 
different sources of haplotypes for this paragroups. HG C5-M356 (MPD: 9.5) 
showed multimodal peaks indicating diverse sources of loss of haplotypes by drift. 
HGs such as J2a*-M410 (MPD: 10.1), J2b-M221/102 (MPD: 9), O2a-M95 (MPD: 
7.05) also showed multiple peaks and also high MPD values. The haplotypes could 
have had multiple sources within the haplogroup. HGs Hla*-M82 (MPD 8.1), L1- 
M27/76 (MPD 7.8), Rlala-M17 (MPD 7.4) showed a unimodal peak indicating long 
term demographic expansions among the populations of Andhra Pradesh within these 
NRY HGs. 


4.1.4.8 BATWING and ASD analysis 


The populations of Andhra Pradesh showed a coalescence time of ~8.2Kya. 


The total ancestral effective population size was 13,701 (95% CI: 12,956-14,157) 


q2 


eaeuped fi 
eeanes [jj 

wee i 

ney ij 

ren 

ebipey [i 

Appay epuoy [J] 
oO ndey fo 
ewwey O 

ueer [i 

mpeA umyeg i 
1GoXan uwyesg aa) 
ieiitaas uilwyesg | 
euypuy uuyesg i 


dpe IANS 
UN PPE 
“e1e1Y OHA 
LIW-B 18 


Figure 35a-35j: Mismatch distributions of YSTRs for HGs for study populations 


of Andhra Pradesh 


. HG C5- M356 
- MPD: 9.518 


HG J2a-M410 
MPD: 10.159 


ome T HG F*-M8s9 
aa me MPD: 11.039 


@#rrsxebC CT eeennannsnernennnnnas 


Vnweighted mess pairwise Gttereace: 11.099 


oe som HG H1la*-M82 
ion 7 MPD: 8.147 


188 
StnrnOonws sae 2012 


CCP IASC TAESHMHNOHKMHEKTAHHMHRHKNSHUAHHYHNBH 


Unereighted mean pairwise dflerence: 8.147 


HG J2b- M221/102 
MPD: 9.004 


Unweighted mean parwise dierence: 9.004 


HG L1-M27/76 


wn MPD: 7.816 


Unweighted meas pairwise Oflerence: 1.816 


' 


Relative frequency 


0.18200 


0.16546 


0.14891 


0.13237 


0.11582 


0.09927 


0.08273 


0.06618 


0.04964 


0.03309 


0.01655 


h 


HG O2a-M95 
MPD: 7.055 


Unweighted mean pairwise difference: 7.055 


Relative frequency 


0.16596 


0.15088 


0.13579 


0.12070 


0.10561 


0.09053 


0.07544 


0.06035 


0.04526 


0.03018 


i 
HG Rlala-M17 
MPD: 7.404 


Unweighted mean pairwise difference: 7.484 


‘Relative frequency 


0.13104 


0.11986 


0.10787 


0.09588 


0.08390 


0.07194 


0.06993 


0.04794 


0.03596 


0.02397 


0.01199 


J HG R2-M124 
MPD: 9.092 


Unweighted mean pairwise difference: 9.092 


(Table 8). Whereas the TMRCA was found to be 61,352Ybp (95% CI: 59,131- 
64,775) and the population expansion times was 32,671Ybp (95% CI: 26,388- 
35,167). In the phylogentic tree computed by BATWING (Fig 36) it was observed 
that Konda Reddy tribe showed a distinct line of evolution of 8.2 Kya. The Brahmin 
populations of Andhra Pradesh showed a coalescence time ~SKya. Kamma and Jalari 
showed a recent time split of 1.7K ya. Whereas, the wet/dry land farming populations, 
Kapu and Yadhava, showed a relatively deep split time of 3.3Kya. The SC 
populations (Mala and Madiga) also showed a recent split of 2.4Kya. 

The two Brahmin populations had only one HG (Rlala-M17) as the common 
HG. The ASD was markedly different among them (Brahmin ANV: 12.1+3.4 Kya 
and Brahmin Dravida: 3.4+1.4 Kya) (Appendix 11) suggesting that these two 
populations had their own unique histories, that was also supported by the gene 
frequencies, PCA and MDS plots (Fig 33, 34). HG Hla*-M82 was present in the 
majority of the populations of Andhra Pradesh. The ASD estimates for HG Hla*- 
M82 was different in various populations with three of them showing > 20Kya, four 
showing 10-20Kya and the others showing <10 Kya. This indicated long term 
expansions and markedly different sources of this HG in Andhra Pradesh. 

Overall the results suggest the populations of Andhra Pradesh were 
differentiated based on their mode of subsistence with no gene flow detected for at- 


least 2000 years. 


74 


vAy UI Sos 9Y) JUDSO.1d9.1 SABqUINU IY], 


Appay epucy 
8e78 
Perl 
L9bz 
earpeyy 6L8b 
£962 
066s 2g6 
efqeqmas 
pbs 
9621 
m2y 
$71Z 
ANY Urunyerg egLz 91g 
PTZ 
ueper 
POLT = 
pumrey L222 
POLT 
nley 
Loep 98¢ 
¥ eacypey 
L7eE roi 
ndey Lop 
LTE 
19g 
eplaeig urunyerg 
eee 


ysopeig viypuy jo suonendod Apnjs 10j 99.13 IQeuss0[Ayd paseq ONIM.LVA : OF 2ANSI 


4.2: NRY HG L1-M27/76 Story and India 

The results described in the previous section on the populations from Deccan 
region of India has shown that the HGs Rlala-M17, Hla*-M89 and L1-M27/76 as 
the most common HGs. As the present study region is majorly inhabited by Dravidian 
speakers, I performed a Pan Indian analysis of HG L1-M27/76 that has been 
previously described as a marker for Dravidian speakers of India by Sengupta et al, 
(2006), so as to decipher the origin of these speakers. However in the study by 
Sengupta et al, (2006), the representation of samples from various geographic regions 
of India was not appreciably large (L1-M27 sample size was 55). In the present study 
I investigated 611 NRY HG-L1-M27/76 chromosomes from a total of 5,099 samples 
studied under the Genogrpahic project from this laboratory. 212 samples came 
directly from my study. 
4.2.1: Distribution of NRY HG L1-M27/76 

The distribution of NRY HG L1-M27/76 present in the Indian populations was 
analysed from the ‘Genographic’ study (present study: 611, studied by others: 28). 
The Pan Indian data compiled from the Genographic Indian centre showed an over 
representation of samples from Tamil Nadu, where the sample size was almost twice 
the total samples studied from other states of India (Table 16a,b). Hence to reduce the 
effect of uneven sample size on data analysis, I reduced the Tamil Nadu samples from 
each population to one third randomly. Thus the total N studied from Tamil Nadu was 
reduced from 1,356 to 344 and hence a total of 404 L1-M27/76 chromosomes from 
4,772 samples were studied from across India. Table 15 shows the L1-M27/76 
frequency from the truncated Tamil Nadu data and those of other parts of India. A 
predominant presence of HG L1-M27/76 with a frequency of > 5 was observed in 


Piramalai Kallar (N=24), 0.2 - 0.4 frequency in 20 study populations and 0.1 — 0.2 in 


de 


Table 16a: Frequency distribution of HG L1-M27/76 in India and literature data 


N 
L 1 + 
Location Population pa oven Subsistence studied Nee ss P 
ranking truncate L1 | frequency | value 


Ps [2 | ome |_| 
a a ST 
Gadi asi | 1 | Tribe [Wet land farming | 25 | 2 | ono | 

Himachal Guar usin | 1E | Tbe [Domenioaion | _o0 | 1 | oor [3-0] 

te [caste |Warrir =i 

te [Caste |Wwer and Trming [46 

ef caste faratmin | t . 

1 [Caste Jory anfarning | 16 | 1 | 00605 |_| 

te [Caste [Dry ana arming | 66 | + | 0.0606 | | 

1 [Tribe fry tana farming | 11 | 1 | 0.0909 |_| 

= 

= 

= 

[es 


wm 
&. 
no} 
5 
Q 
2 
Q 
Qa 
=v 


Punjab 
Brahmin Saraswath 


Brahmin Goud 


[tite [Forages Sid) oy | 
[tribe [Forages | 2 | 3 | 0250 | 
tite fForages ——OSCid se | os | 
iE [caste [Prietiood | | 3 | ons | | 
AN 


Rajasthan 
eena 


Brahmin Saraswath 


Seharia 


Pt | Caste [agricuture | se [1 | oor |_| 
1 [Caste [ry ana tarning | 29 | 1 | oss | | 
1 [Caste Jory an farming | 15 | 2 | ass |_| 
1 [trite [ry na farming | _72_| 6 | 

[ie [caste frishing +19 | 3 | ase |_| 
=a 

TAA 

repr 


Jain_Rajasthan 
Jatt Rajasthan 


Kathodia 
Gujarat 


3 
caste [Dry lindane | 7 [| 1 | oma |_| 
[tribe [Forages ———~+Y 401 | omso |_| 
Kola cor | tribe [ory tan farning | 27 | 5 | oaes2 | 
Wat ae [teen mating ae TSP ous 
i ese ee os 
Brain Chgpvan [1 | Caste [Brahmin | 28 | s | ores | 
hangar | te | Caste [pastor | 40 | 2 | sto | 
Maratha | 18 | Caste [Warior | 3 | 3 | coos | 
Brain Goudkaraswal TE | Caste [Pviestiood | 94 | 1 | on | 
Brain Howyaka [1 | Caste [Wetland arming | ee | 1 | core [1E03 
Foon | 
Adikamaaks | sr* | Caste [Wet and tanning | o# [9 | oxao6 | 
pilava | SDR | Caste [Wetland farming | se | ¢ | oso | 
Gowds | Sor | Caste [Agricuture | #1 | 9 | on | 
Koda | SDR | Caste [Wet land fanning | 50_| 7 | oxao0 | 
a 
hegoesa Jaa ncaa one = oe 
uruba | SDR | Caste [Pastor | 15] 3 | 00s | 


Rajput Himachal 
Meghwal 
Meghwal 


Subsistence 
value 


N+ for Ll 
L1 | frequency 


1 0.0278 


| 6 | 0.162 
La 


Caste 


Wet land farming 


N 
N 


Caste 4 0.363 2.E-0 


0.083 


0.082 


| 6 | 
| 8 | 0222 
| 3 | 


3 0.300 


0.250 


0.333 


Ee 
0.152 
| 6 | 
ae 


Tribe 
Caste 
Caste |Agriculture 
Caste _|Pastoral 


Caste 


Ww 


ge 
i 
Vaan 
Caste | Wet land farming 
Caste _|Warir 
: [Nattukottai Chettiar | SDR | Caste 
: 
: 
IE 


Foragers 3 0.150 


Foragers 0.333 


0.080 


| | 
ze 
| 2 | 0.3158 
| 2 | 
za 


Foragers 
Foragers 

Dry land farming 
Artisan 

Dry land farming 


0.088 


1 0.083 


2 
2 
2 

| 8 | 0.2722 |7. 
2 
2 
2 


m 
So 
w 


L ial 


re 
So 
a 


Traders 

SC 

C 

Cc 

Dry land farming 


0.4667. |3. 
0.087 


<|< 2/o/e ¥ 
2/18 Ble le 9 
@ |S s/3 |s > 
12 |= /E|° 2 |" [2/2 
3 = 
fo} 
8 
a 


0.217 
0.138 
0.534 
1.000 1. 


S 


Caste 
Tribe 
Caste 
Caste 
Caste 
Caste 
Caste 
Caste 
Caste 
Tribe 
Caste 
Caste 
Caste 
Caste 
Caste 
Tribe 
Caste 
Tribe 


b 
ea 
So 
an 


re 
oS 
N 


Various 
Artisan 
Artisan 
Artisan 
Dry land farming 


Oo Oo 
N N 
oS —_ 
n = 
Re 1S JO JO JF IO JN ITN JW Ih SoS IWISo [Re | SIO IN IN [WIS 


0.096 


elt 


0.105 
Yadhava TN 


Andhra 
Pradesh 


dN 


Dry land farming 
Priesthood 

SC 

Wet land farming 
Wet/dry land farming 
Wet land farming 


0.2941 | 1.E-0 
3 0.1500 


0.0111 
0.2308 
0.2444 
0.1358 
0.1081 
0.2121 
| 1 | 


1 0.0278 


0.061 
: 
0.032 


pi 

| 1 | 0.013 
0.025 
0.055 
0.1111 


eal 
zi 
eal 
| 2 | 
| 8 | 
Le] 
Ee 


1 
1 
1 
1 


ee 
Ww 


w 

mM) 
Solol[so 

BK 


Foragers 
Pastoral 
Cc 
Cc 


: 
N. Orissa = 
Brahmin Utkalya 
Tribe 


Gond_MP 
| 
Chattisgarh 


S 
Ss 


~_ 
a 
nn 


Ww 


4.E-03 


Oo 


3 
3 
7 
3 
2 
2) 
2 
2 
2 
2 
2 
3 
2 
3 
2 
2 
9 
3 
3 
3 
3 
7 
Priesthood/agriculture 6 
3 


— 
— 


So oO Oo 
So N N 
n N Ww 
N & an 
HDnIN [rn ee |O a 


N 
6 
7 

12 
3 
6 

10 

4 
0 
8 
0 

3 

13 
3 

12 
9 
6 

12 
2 

15 

2 

10 
1 
9 
5 
0 
0 
1 

81 
7 
3 
6 

82 

58 
1 
gi 
4 
9 

54 

18 

50 


Slelyly Ss) 
a fg fg fa |5|5 
BIB |/8/8 Bilo|?o 
ajajayj;a Q 
SilelsS|S Ry 
SIS |S )5 a 
ge joe |0Q |G09Q ga 


L iat N+ f 
ae ee 


assem [katte «(| E [case [Agricuitre = 10a 9 | ooees |_| 
Arunachal fusimi [rw | Tribe [Agricutnre | _a1_| 1 | oars |_| 

lar | 8 | 67 | 
Afghanistan 


Batush | ts | se | 

rajik | EP | rr | 

bee | tt | sss | 
po ts | 02000 _ | 


Pakistan 


a ee | EAL 


Note: 

' TE: Indo European language 

? AA: Austro Asiatic 

> CDR: Central Dravidian languages 


“ SDR: South Dravidian languages 
° TB: Tibeto Buraman language 
N’: Number of samples that have the derived allele for the mutation M27 and M76 


Lae 
ee 
| ¥L900°0 | 
| 
aa 


ae 
a 
Eo 
| €2-d9'9 | 
| 90-AS'C | 


ekeng 
eArue 


OF seeioa] aut 

[ssieioa] sant aT 

SS PET PaUITUNY 
a[ ae) 


Ly | sioeerog] oqaL | Yds 
HL | was | 
HL | ds | 
sar [was | 
| yer] ase | al | Weueseyoeyeig uruyelg 
| yds | 
| yds | 
| uds | 
| yds | 
ie eee 


WI 
eid 


6 (Raney pur Ric] ase 
[say aise] Sue FURL 
[say aise] BHUSEIOS 

er Rammeypme Gal ase [was | _ RP eruET 

a [spas [was [eg 

ee [ost ame [was [| ——S—Caig 

ee [ _osfese [was | Pa 
ost | Sepear[ ase [was | _RmRID HOMMNN 
cor faa pue| Rig] awe [ws [PPE 

a [ msmiyp sey [was [ATEN 

sc_[_semeoy[ equi [was [equim 


re 
[nisms 


OL 
€8 
6€ 
C 


N 


=IN 


OL 
9V 


SUBIUNA} OYA SuUOHendod npeX [TW] Jo Asusnbaay :qgy aqQe_L 


61 


™~ 


| $6 
| ok | 
Le 
| oe | 
aa 
| th 
| tL 
| ee 
| ee 
| Ost | 
| 69 | 
| at 
[| 
| 6t 
a 
i 
La) 
| 206 | 
Le 


28 study populations, whereas the 56 populations frequency of 0.9 and above. HG L1 
was seen majorly in caste populations (FET 1.E-15) than in tribes. Linguistically, it 
can be associated with South Dravidian language (SDR) (Table 17). 

As a region the Deccan, showed the highest frequency of HG L1 compared to 
northern Indian states. Tamil Nadu (L1-M27/76 frequency: 19.1%, FET 2.E-10) 
followed by Andhra Pradesh (11.4%, FET 2.E-17), Kerala (10.7%, FET 5.E-04) and 
Karnataka (8.95, FET 1.E-09) all showed appreciable frequencies: select population’s 
in-fact had higher proportions (Table 16a, b). The high frequency L1-M27/76 in 
Tamil Nadu could be attributed to the expansion of this haplogroup in this region or 
recurrent migrations of this HG carrying people. The distribution of L1-M27/76 was 
represented in a contour map (Fig 37a). The highest frequency was identified in Tamil 
Nadu region and there was a decrease in the frequency of this HG as one moves north. 
4.2.2: L1-M27/76 17Y-STR variance distribution: 

The 17-YSTR variance averaged over all loci is represented in the Fig 37b as 
a contour map. The highest YSTR diversity was observed in Tamil Nadu and as with 
the frequency contour plot, there was a decrease in the variance as one moves north. 
To determine if the variance observed was a result of multiple events of admixture/in- 
migrations the Sum of Squared Difference (SSD) from the median for each region 
was calculated and the frequency of haplotypes observed in each mutational step is 
represented in Table 18. 

The sum of square differences (SSD) estimate based on YSTRs allows one to 
show the distances of haplotypes from the median haplotype. The median haplotype 
obtained from the study samples was considered as the founding haplotype (Sengupta 
et al., 2006). For each geographical region, the number of samples that were present 


in various distances from median haplotype was counted and expressed as frequency 


76 


osensur] ueueIng OOqLL, ?¢. cle 
SOSeNSUL] UIPIAvIG YINOS yACS : 
sosensur] UeIpIAvIg [enusg °YdO ¢ 


oneIsy onsny “VV , 
osensur] ueodoing opuy :q] ‘ 


:O10N 


sioyvods osensuv] uvodoiny-opuy pue uvIpiAvsg YyNOS UI UOI}e}UISI1d9.1 
JIYSIY SUIMOYS VIPUT JO SAyvods oSeNSuL] SNOLIVA UL 9//LZIN-I'T DH JO sotpuenboay :Z] aquy, 


Ez 
| tora 
Ee 
come | est | ¢ | i |  ¥ao 
Eis 
LSS 


Figure 37a: Contour map of NRY HG L1-M27/76 based on its frequency 


B 3 3 84 % 28 30 32 3% 36 38 
Figure 37b: Contour map of NRY HG L1-M27/76 based on YSTR variance 
64 Bi 0 72 5 5 30 5 5 . re) 9 5 


(Table 18). Tamil Nadu, Andhra Pradesh and Karnataka had each one sample in the 
median. Karnataka populations showed an expansion of haplotypes with a distance 
range of 0-16, while Tamil Nadu samples showed haplotype spread from 0-30 steps, 
the maximum identified in the study. The other regions of Deccan — Maharashtra and 
Kerala showed discontinuous distribution from 1-10 and 1-7 respectively. In contrast 
to the Deccan samples, the North India and Bihar regions showed a discontinuous and 
sporadic distribution at various distances, without much continuity. Further 
haplotypes closer to median were also not found among them. 
4.2.3: L1-M27/76 Haplotype network analysis 

Many reduced median networks were computed to determine the relationship 
of L1-M27/76 haplotypes with each other. All the networks were computed using 
reduced median algorithm with reduction threshold set to 1. The first network was 
computed using all the 611 samples from the Genographic India study (Fig 38). In 
this network Tamil Nadu samples were strewn all over the network: Nonetheless the 
centroid / median haplotype was formed by three samples from three distinct regions 
(Tamil Nadu, Karnataka and Andhra Pradesh). As explained previously the Tamil 
Nadu samples were highly over represented. Hence the second network was computed 
with the reduced data (N= 404) (Fig 39). The picture did not change much from the 
other network, but the north and west Indian L1-27/76 lineages were well 
differentiated in this network suggesting a unique line of evolution. In the 17 STR 
haplotype, with or without truncated samples showed a central median node 
comprising samples from Andhra Pradesh, Karnataka and Tamil Nadu (median 
haplotype 12,16,22,15,14,15,15,10,19,12,10,14,11,12,24,12,11 for YSTRs, D389a, 
D389b, D390, D456, D19, D458, D437, D438, D448, DH4, D391, D392, D393, 


D439, D635, D388, D426). The median haplotype was constituted by Kapu (Andhra 


77 


oZIS o[dures Moy JO asnvd0q SISATeUR SITY} WOT poyeUIUTT[S 9IOM |] pure WV x 
sooudrayjip orenbs Jo wing 0} SIojol GSS 


v0 0|70 70/80 | 
||| efi] 


Sores asensuyy snoliea suo sdAjo[dey UeIpoW [e.QUdd WO poyndurod ['] DH JO $99uK}sIp GSS :G8I AGBL 


reula 
jeqoeuny 
ysopelg en) 


Ee: 
|_| 
Ea 
i 
Ee 
|__| 
= 
= 


(poyeouny)) npeNy [rue 


porpmys 
apt efede ee fettefeeatetetete tet te Pet em 


odAjoj dey ueipou [e.yu99 WO pooursip ore yey} sopdures Jo odeyUIdIOg 


0 ar Ysopeig erypuy 
LL \Vbl ie 4 ByeeUIL yy] 
06 


an 
N 
4 
_ 
‘© 
io) 
4 
a 
Coa 
= 
= 
a 


sayeys Apnys snoriea Suowe sdAjo(dey uvipow [v.143U99 WOAy poynduiod ['] DH JO Seduejsip GSS :¥ST Iq". 


poroydioap oq ues 

suonejndod uevipuy YON Jo AypeuoNoosIp oy) Ing ‘suoTjeyndod npeN [we] Jo ozIs o]duues YSiy 0} onp 

Suldde]-9A0 YONW AJOA SBM YIOMJOU dy | ‘OdAjo;dey ay) Jo Aouonbasy dy} 0} feUOJIOdOId si spou 94} JO 
OZIS OU] ‘o]dwes ou} s}udsoidol JO01I0 YOR ‘soydues oui []e UM poyndowo sem yIOM OU 9 


qetung 


eer 'N fj 
wessy ia 
weismeyd [j] 
yeeing fj] 
jeyoewiH []] 
nuwer [F] 
yssiuo’s [ij 
vssuon fj 
ysapeig esypuy fj 
seug [fj 
BJSeIeYeyy e 
ysapeg senn [i] 
ueyjsefey [] 
npen wwe, fi 
eyejewey fj 


/ |} i \ ‘ 


(SULSA LD) suoyejndod uerpuy ued 9//,71-1'T OH (IT9-N) Te 105 YAOMjou JouUISo0]Ayd ULIpoU psdNpoy gE I1N31y 


sA0ge 
ysopesd ten) oO 


dy} JO Josqns v sv Susie JO\sN]O URIPUT YON oy TYLM sik ca 
S$o]RIS URIPU] YNOG SuoweL UONNOAd syoidap yOAou essuug yng 
dU], ‘soweys Apms quosoidos inojod “Aousnbay adAy uepseley 
-o]dey orp 0) feuosodord st 9[DIID DY} JO VOY “SYLSA 
L| Aq pouyop sodAjodey yuososd3gy So[OIID ION 


qefung oO 

BSSUQ YON fj 
vey 
enyseueyey fj 
vyvjewey fj 
numer 

ysopeag jeyorwiy 
qereiny Fj 
qessmeyo oO 

zug G 

uessy [] 

ysopeig jeyovuruy [] 
ysopeig eiypuy fj 


SULLSA LI WM (POP=N : pozvoun.y NpeX [rwey) suoyejndod ueipuy ued ul [| DH JO YAOAjIU JHIUaS0]AYd UvIPIW pIdNpIy 2 6E 2NSIY 


Pradesh), Kuruba (Karmataka) and Ezhava (Tamil Nadu) populations (Table 19).The 
other haplotypes radiate around this node in different tiers. The pattern was very clear 
in Tamil Nadu truncated network (Fig 39). The presence of L1 samples from Deccan 
in all radiating branches of the RM network makes Deccan as the candidate of origin 
of Li. 

To determine the origin and expansion of this haplogroup the Indian 
haplotypes were compared with those of the literature, such that the L1-M27/76 
samples from regions not sampled in the present study were included. A set of 9 
common STR haplotype data was available in literature was compared with the Indian 
dataset by a network analysis (Fig 40). In the RM network, the Afghanistan, Pakistan 
and Syria populations shared L1 haplotypes at the second mutational step from the 
median. 90% of the median was composed of samples from the Deccan, 75% of them 
being from Tamil Nadu, Karnataka and Andhra Pradesh. Further the IE speaking 
north Indian populations showed a discrete radiation from the median, with specific 
and well defined divergence on its radius. It is to be noted that that 9 YSTRs that 
were compared with literature data once again showed only Indian that too Deccan 
samples (Sans 2 Assam samples) as the median haplotype. 

To determine if reduction of STRs (17 to 9) in the previous network created 
any bias on the analysis, another network with the remaining 8 STRs of the Indian 
samples was computed (Fig 41). In this network, the extent of sharing within the 
study states varied when different set of YSTRs were _ considered 
(ie.,D456,D458,D437,D438,D448,DH4,D635,D426). Importantly the samples within 
the median changed, although the predominance of Deccan region was observed. 
Similarly the North and West Indian populations formed a separate cluster. Hence 


with all the 17 YSTRs the stringency was high, eliminating the Maharashtra, Gujarat 


78 


ESSN ESA OC 
ase | ar urmyeig| Temseses phon uuyerg| eyejyeuey 
jase) | Yds _| [Lee 
paseo [was [Sumner pueren| REED 

a ee 
el SE a re 


[¥}0} Ysopeig BIypuy 
[Fase was Raman paeraema[eamHAS) 
Fase as [ony re) 
[Fase as ara paeraemy 


er ee [e0) CAUSE AETETY 
Ee) a 
zfs [ at [some eure] engsoseyeyy 


I equL | wdo uesiyry /suTUnEy weloy 
pure] Aiq 


wan aa 
IVP OUereror's| = pypz'zq ZU }<<< sodAojdey uerpoyy 
USTPST COUT] “61 Or'sTstst | [OL Pl Z7OU'TI Pl OlPl COLI 


9778 
SEC‘ SEIA6EPA'E6 
€a‘76EC' 16€A‘ PH 
A'8rra'sera‘Ler B8E|<<< pasn ULSA 
9TPA'SEIA'| A'8Sra‘6IA‘9Sra‘| 97PA‘SE9C‘*vH] A‘6Era'E6Ea‘7%6} A*6EPA'E6EA ‘76 
PHA'8rrd'8ErC'}06EA'I68EA C68EC] A‘8hrA' sera‘ L| EA“ 16EA‘61‘06| EA‘ 16EA‘61A‘06 
LEvd'8sra‘9s¢a Ep 8SPCA‘9SPa} EA I68Ed P68EC| €A‘468ECP68Eq 


sopotaes NL IV vipu vIpUy s vIpUy Ss PLt0omM s ae ae Ud}SISQNG JO apo uonejndo uoIsa 
) 
vIPUT SULSA 8 Ipuy ULSA LT IpUT SALSA 8] PIPUL SALSA 6 | PI ULSA 6 |! enguey ISISGNS J yelndod 19a 


PIAIPISUOD JIQUINU YLSA SNOLIVA YIM YAIOMIOU ['T OH JO Uvipoul [v.VUId Jo uOyIsodu0D :6] IqUL 


4 


P 
| 
Pp 
P| 
a 
| 
Pe 
ye 
pe 
p 
a 
as 
P 
= aa 


Sa 18}0) npeN [rue y 
ase | | 
pases [was | Farner pay Miq) sear 
Jeyey lejewestg) npen [wey 
a5 as Ray pt Seq] es 
Pe fase [was Baraney paey Rig) eee 
es a _ ee 
ese el 
ar as Se a ee 
PPD] AS | Suey pueeM| PN 
aseo| MGs | ISM] BRIO 
er uas a a 
aaa a | 
ee eee eee 
a 


ee es ee ir ee eee e 


1¥}0} ByVIVUALY 


ie a 
Pease [was [Saray paerem| ePOD 


su 
sardues NLL IV VIPUTALSA LE | VIPUL SALSA 8} VIPUL SALSA 6 | POM SALSA 6 | Prues _ seine uonendog 
BIpuy SULSA 8 Tone ensue] JO 9poy : 


| 
P 
Pp 
a ae 
a aa 
ae ee 
= aa 
pF 
aa 
ar aa 
a ae 
a ae 
en 
a 
i __ 
= aa 
| 2 
— aaa 
P| 
= === 5 = 
| SE 
a ae 
= aaa 
P| 


SYIOMJOU IOYIO UI POAIASGO se Jo\snId oeJodas v SuISIOAIP 
UddS SI JOS BILP ULIPU] YOU dU} Jey) Pojou oq 0} SII] “6 | 91Ge} Opou ULIPoUT [eNUIO Jo UOI}ISOdWIOD OY “SYLSA 6 JO UONNyOse1 
B YVA OINILIOYI] Ul O[QUTTRAR dsOY} YIM Suoye Apnjs yUosoid ay} UI pasn jos yep Suledwios Aq pojndwiods se@M YAOMJOU SIU], OJON 


ueistyeq yinog 
ysopeig eng EE 


USSUIQ YINOY By 
ueipseley 
qefung & 
PSSHO YON pm 
peo YON 
enysereyeyy 
eyeyeuey gy PP PRLPPA 
nuiwuer FF " 

sopeig [eyoRUNH fF ~ 

jel —— Oyu NN 
wodsmey) —~_ Brora, 

aeylg FE 
wessy [~ 
ypeig [eyseuniy fe 
ysopesd RApUY 
RLIAS | 
uvjstueysyy o 


6 24k} Ul UDAIS SI OpoU URIPIW JO UOTISOdWIOD OY “AN{RIOYI] UT OTQeTIeAR JOU 19M Ady} 
SB BVP ONLI] YIM UOSTIVAWOS JOJ Posn JOU 319M JVY} SYLSA B UO poseg UMBIP SI YIOMJOU SIU 7OJON 


qefung fj] 
TwHOwNnY [_] 
Fee nN ij 
wessy C] 
webdsyeyd fF 
yeseing [ij 
jeyoewiy [7] 
nwwer [F] 
vssno's fj 
vssro'n ij 
ysepeig esypuy fj 
seug [F 
BJISBIEURPY fz] 
ysapeig Jenn [fj 
ueysefey [J] 
npen wwe, fy 
eyejewey fj 


(FOP-N) SULLSA 8 UO paseg suoneindod Apnjs uvipuy YIM [7 OH JO YAOMJou IHIUISO][AYd URIPIM pIdINpdy :[p wNSL] 


and Assam populations from the median. Thus the choice of YSTRs and their 
application in deciphering the origin of haplogroups have to be dealt with caution and 
the YSTRs employed in this Genographic study was more informative than the 9 
STRs that the people have used yesteryears. 


4.2.4: MultiDimensional Scaling 


A MDS plot was computed based on YSTR-Rst distances for the common 9 
STRs of HG L1-M27/76 chromosomes obtained in the present study along with those 
of available in literature, on Afghanistan, Pakistan and Syria (Fig 42) for 9 YSTRs. 
Populations with a sample size of at least 3 were considered for the computation. A 
stress value of 15.71 for three dimensions (k=3) was obtained. Pakistan and Syrian 
populations clustered along with the populations of Deccan, India. The populations of 
Tamil Nadu were widely distributed in the middle of MDS plot, indicating diverse 
STR evolution whereas the populations of Karnataka formed a relatively tight cluster. 
Most of the Brahmin populations were outliers. The high variance observed in the 
south Indian population advocates for a long term evolution of L1 STRs in them. 
Unfortunately 17 STRs were not available in the published literature to further refine 
the affinity of the Indus valley and Syrian populations. The genetic distances of these 
populations studied was further depicted in NJ tree that was computed based on Rst 
distances. 


4.2.5: STR based age estimates 


Table 20, 21 presents the YSTR variance, ASD based age, effective 
population sizes and population expansion time estimates of the study populations and 
literature data. The total HG L1 -YSTR variance for 9 YSTRs was 0.35 and ASD was 
14,127+4,066 Ybp. High ASD estimates were observed in SDR speakers of Tamil 


Nadu populations followed by SDR speakers of Karnataka and Andhra Pradesh. 


79 


Figure 42: Multi Dimensional Scaling of HG L1 based on Rst distances of global 
populations (9 YSTR) 


i Afghanistan Karnataka (J South Pakistan 
Andhra Pradesh Maharastra Syrian 
Assam N. Kerala Tamil Nadu 
Bihar N.ORISSA Uttar Pradesh 
Chattisgarh Punjab 
Gujarat Rajasthan 


Mog 
TAU 
Kap__ Nad 
am Qn sion 
0.20 : 0 i @ 
nc 
s Bad 
Kat : 
0.00 °° 
N 
c 
2 
2 -0.20 
rT) 
& 
Q 


-0.40 


-0.60 


0.0 0.1 
0.3 iy ee ion 3 
Dimension 4 a6 pimen® 


Note: Populations with N>3 were used. Populations outside India are indicated as squares 
and diamond. Indian populations are indicated by circles. The population represented by code 
are mentioned in Appendix 16. It is to be noted that the majority of the Brahmin populations 
fall towards the periphery of the cluster. Relatively, Tamil Nadu populations were distributed 
videly than Karnataka populations 


Syrian populations do show matching ASD as that of Karnataka populations 
(12,400Ybp) but they exhibited low YSTR variance. Similarly, South Pakistan and 
Andhra Pradesh populations also showed ASD estimate of ~10,000 years, but again 
the YSTR variance of Andhra Pradesh was 1.6 times higher than Afghanistan 
populations. Higher sample sizes from these regions are warranted. Among the Indian 
study populations, IE speakers of Uttar Pradesh showed higher variance (0.60) and 
ASD of 33,000+18,300 Ybp (9 YSTR). But as mentioned SSD estimates that high 
variance in these regions was attributed to multiple sources of HG L1 among these 
populations. 

When 17 YSTRs were considered, the overall HG L1 variance for Indian 
populations (0.4) was found to be equal to the variance of SDR speakers of Tamil 
Nadu (0.41), Andhra Pradesh (0.4) followed by Karnataka (0.36). The ASD of Tamil 
Nadu was found to be the highest (20,0004 4,700Ybp). However, Kerala showed 
higher variance than the pooled variance of HG L1 itself. This could be the result of 
gene inflow into the populations of Kerala from various sources. The total variance 
and ASD estimate for 17 STR were 0.4 and 14,3824 3,088 Ybp. Whereas the total 
variance and ASD of 9 STR data sets were 0.35 and 14,127+4,066 Ybp though this 
set included outside India data as well. The high variance and ASD estimates in Tamil 
Nadu again suggest, an origin of L1-M27/76 here. 


4.2.6: BATWING estimates of population parameters and phylogenetic tree 


BATWING age estimates showed a higher TMRCA and population expansion 
time in Tamil Nadu showed (52,200 95% CI: 29,300-1, 03,475 and 17,325Ybp 95% 
CI: 10,125-30,825 respectively), which is in consensus with previous analyses. Indian 
populations showed a higher effective population size of 1,760 yrs (95% CI: 712- 


4151), higher population expansion times of 10,012 Ybp (95% CI: 5,664-2,0240) and 


80 


Figure. 43 NJ tree based on HGL1 — M27/76 Rst distances for all Indian 
and literature data (9 YSTRs) 


Siliagtagl ake 


fai Wogaesa 


A "hat 
Pal wa 

“Ask A Agr 
Kerucdae 


Sethtsaapl 
: een 
REDD lujsaetirk 


E.urnva 

“Acre 

Tam airs 
Paw 


a vA 
1 regtigae 
a> 
“ Pris slnaka dee 
Painhav aay 
ere 
: i ‘Koda 
Ei asqypa 


Paar rr a 


: | en asl 
hea dea 


Waal 


nls 
_ Clow ASIC 
a a obs Kehita 


OES Tr hot wai 


rate obs gihil 
z oe | . a Shc inary. pane 
Sul 
BST? Gaye) al 
hea" 
, Parayer 
i - balsa 
: Kanes 
techinid 
Tahminviodsevaswalh 
rs Ful<a1 
sy 2 uhinLika ye 
Kal 
tl 
TTT Eetinrval 
— Mit Karalur 
Twit 1h, 
Fore 


WM 
Trait re wipaan 
4 bran Kival 
Irigy 
Leni tdi 


Warr Th 


‘WIR[OY pue spuoy [ey ‘spuoy s19M suoye[ndod yYqsD Jo soatjeyuasoiday ,. 
suorje[ndod Apnys eipuy wos ATUO pasn d10M SYLSA LT ‘APM sty} ul uostredu09 Joy posn pure sfquylear 10M SYLLSA 6 ATUO :930N 


pe | caer | ovo ||| ero) veo || er es | TON AEN IAL 
| | fo fooe'st | ovo || || ome [ooeor| ovo | was | ss | usepeid eiupuy| 


ae eee Neer | COAT | OF! [oz [ooror| sco | a | “ palais 


me | S068T | PF Loose fooorr | ceo | | C" | *° [ome [oor | zeo | was | ™ aii 


| | Tf mes | sis | ovo | | | me foop‘or| ovo | at | 9 | wsepend reqoeuny 
pot oe foostor | oro | ar fk  feooreas) weasrred winos 
ae Seam RES ED ae a ae a See 


as siaidea stad JULIA asv_ ja a3 porpnys 
I UBLIE A I 
qovo UTY}IM as | ASV jsouevne,a) AS as dSvV_ j90uRLIe A uolsay 
ajoo 3100 ajoo pepoog ensue 
Po q| acy pajoog p2]00d po9jood T N 
ULSAALI ULSA ~6 a a ee 


ByVp I.1Njv.19}T] uv suorse1 ApNjs UvIPUT WIA sdJVUIS 93¥ paseq SV PUL 99ULLIVA WLSA 9L/LZW-UT OH 207 O48 


ID %S6 YIM SONRA 0} SIOJOI JOYIeIG UTYIIM sone A, 
poyUPLIEM SI URISTyeg pue uR\sTURYSSY WWOIJ sozis o[dures IoysIF ‘OJON 
UOTINVS YSIY YILM UdJe} 9q 0} SLY PUL S[RAIOJUI OPIM POMOYS ONIM.LV Aq poyepnsyeo WOUNVLL LL « 


(0bZ0'Z-99'S) ZIO‘OT (ISTr-ZIL) OOL‘T 119 pp 
L 


(ss8°8r-€19°T) T89°L (L09-Lr) SOT ue\sryeg ynos 


poe | 2 
(bIPSP-L16) S9T‘L (9ZL1-202) ELS ae ae ueystueyspy 


out} (€N)) 9ZIs uoneindod a perpnys ere 
uolsuvdxe uonepndog IAIPIIJJO [V.SIIUY Porpms N suonvindog N woul 


SULSA 6 40} 1'T OH 
JO SISATVUG ONIA. LV UO paseq Suorsat snoLiva Jo soul UOIsUedxs UOKe[Ndod pue sozis UONHLNOdOd 3ADd9IFq PLZ AGU L 


higher YSTR variances than populations of Syria, Pakistan and Afghanistan, thus, 
ruling out the possibility of origin of L1-M27/76 outside India (Table 22). Deccan 
populations showed higher TMRCA, effective population size and population 
expansion time than the North Indian populations (yellow cluster in network Fig 39), 
thus confirming south India as the place of origin of HG L1-M27/76. The present 
study has suggested Tamil Nadu or Karnataka as the most probable region of HG L1, 
with concomitant expansion in Andhra Pradesh. The exact region of origin within 
South India could not be deciphered with the present set of markers employed. One 
may require more L1 subtype SNPs and more informative STRs. It will be worth to 


further investigate the same cohort with these new markers in future. 


81 


a ee (g08'Tr-€Lb'6D) PZS‘LZ 


(90L‘LL-910°F) P66°E1 
(SZO°TE-SLS‘T) SLO6‘9T 
(SL6‘0£-000°6) OSL‘9T 


(0S9‘bZ-00L‘L) STS‘ET 


(S78‘0€-SZLOD) STELT 


(SLL‘8T-0ST‘9) 009‘OL 


(007SZ-SZEL) SLPET 


(SL8‘97Z-SL7'6) SZL‘ST 


(SLZ‘0€-001'L) SZ9“PT 
(SZE“LT-SZZL) OO8*ET 


(SZS‘TE-STZL‘T) OSL ‘6 


aul 


uorsuvdxo uonendog 


_ (SL9L0°7-0SL‘7S) 00666 


(SLL‘6PI-SLL°6£) SLTPL (09L‘TT-€€T'T) 888°Z a a 
(SL9'LL-SLETO) OST‘OP (LI9‘I-L1Z) €19 Pose ft] asp eg exypury] 


(000°L0‘I-SZr°8Z) OS TES (9L81-%67) ZTSL 


(SLP‘E0'T-00€'67) OOT'ZS (€vI‘Z-9TE) €98 


(LES‘Z-1SS) LOTT 


(Zp sans 
UI JO}SN[D MOTTO) BIPUT YAON 


(O0L‘TE-SZZ‘L) OS 8b (SZI‘T-LID) Lye Fo 9 [oe | ysopelg [eyoeuntyy 


porpnys 
(NX) 9ZIs uoNe;ndod epee Ipn) 
[B.QssUY 9ATIIIIFY suoneindodg N 


(100‘T-€€D) IS¢ 


(TEL-L) pL 


(6PI‘LZ-LTZI‘T) 6839'S 


(00Z‘LP-SZH'ED 006°ZZ 


(0SS°S-007) SLY‘ 


(00r‘0E-SZZ‘6) OOP'OT 


SULSA LT 493 TT OH 


JO SISATVUG ONIM. LV UO peseq SuOIsa. snoLiva Jo soul UOIsUedxs UOLe[Ndod pue WIYIALL ‘S9ZIs UoLepnodod sayd9j4Jq 27Z AQUI, 


4.3 Genetic footprints of HG L3*-M357 - An Enigma 


4.3.1 Distribution of NRY HG L3*-M357 


NRY HG L3* is identified by M357 SNP marker (subtype of HG L-M20). 
Though certain studies have identified the presence of this HG in India, Pakistan and 
Afghanistan, their pattern of migration was not characterised. Hence in this chapter, I 
have studied 210 Y chromosomes from India and 45 samples available from literature 
to address this question. In India, HG L3*-M357 was present in higher frequencies of 
>10% in the Northern Indian states of Jammu (19.8 % FET 6.E-23) and Punjab (14%, 
FET 3.E-06. The other states showed <6% frequency of HG L3*-M357 (Table 23). In 
contrast to HG L1-M27/76, the frequency of HG L3*-M357 reduces towards south. 
These frequencies are also reflected in contour map based on HG L3*-M357 
frequency and YSTR variance (Fig 44a, 44b). However, higher YSTR variance was 
observed in Himachal Pradesh (0.38), Rajasthan (0.31) Jammu (0.23), and Kerala 
(0.35). HG L3*-M357 was present in the frequency of 4.44% (FET 8.E-20) among IE 
speakers and 1.79% (FET 7.E-03) in South Dravidian speakers (Table 24). 


Table 24: Frequencies of NRY HG L3*-M357 in various language speakers of 
India 


Language | N studied | L3* p-value 
AA 570 0.00 8.E-07 
CDR 461 0.00 1.E-05 
SDR 2793 1.79 7.E-03 
IE 3246 4.44 8.E-20 
TB 1312 1.14 4.E-04 

Note: 

HG L3*-M357 is represented in high frequency in IE 

speakers 


'TB: Tibeto Burman language 

* TE: Indo European language 

> AA: Austro Asiatic 

* SDR: South Dravidian languages 

> CDR: Central Dravidian languages 


82 


Table 23: Frequency distribution of NRY HG L3*-M357 in India and literature data 


Himachal 
Pradesh 


Meena 


Rajput Rajasthan 


Rajasthan 
eharia 


rahmin Deshastha 


Brahmin Goudsaraswath 
Karnataka 


Brahmin Havyaka 


SDR" 
canal 
aliyan SDR 


SDR 


E 
E 
E 
E 
E 
IE 


IE 
IE 
IE 


SDR 


Social 


ist 
ranking Subsistence 


ribe et land farming 
ahmin 

storal 

storal 

storal 

storal 

et land farming 
et land farming 


aste 
ribe 
ribe 


ribe 
aste 
aste 


ry 


=/2/=|=/=/F = (Fee ere ele 
S |B /B|5|5 
Sele 
Be /Ele/e 
BIBI 5 
iQ joQq joa ga 


0.0500] | 
| 3 [0.1200] 
| 3 10.0278] 


| 1 [o.o3s7] 
| 4 foosai| 


ribe et land farming 


oo 


ribe |Pastoral 
et land farming 
Ca Brahmin 


Caste 


ste 


Wet land farming 
aste 
aste 
aste 
aste 
aste 
aste 
rib 
aste 
aste 
aste 


U 
SIs 
S 
S 
S 


ry land farming 


| 
S 
= 
aD 
rn 
XQ 


riest 
raders 


= 
S 
So 
~ 
— 


e |Foragers 


Sis 
NN) 
x 
a 
\o 


ry land farming 
ry land farming 
riest 

oragers 
arrior/agriculture 
oragers 

ry land farming 


a 
| Caste | 
| Caste | 
| Caste | 
| Caste | 


Q 
ac) 
=|: 
~ 
= 


| 
w]w]rafulalal}oolr}o!r}rafololelaAla[wlafelelrm]ralufrr BIAIETE TAY TAISPePyelelefy[plaly [rele 
BDlA]W]S]aol/K/ Sl AlolwolRlolKRiwolelwlalaolAlalolwlala D/H] S] co] K} co ALSIALALET Ea} pels 
° 


= 
Sis 
oO 
a 
a 


iest 
iest 
et land farming 


SIs 
oO 
= 
ae) 
a 


et land farming 


‘il 


2 
& 
5 


shing 
oragers 
ry/wet land farming 
Dry/wet land farming 
Wet land farming 
Wet land farming 
Warrior 
Wet land farming 
riest 
ry land farming 


1 


TE, 


oa foe) 
a WwW 


san 
oragers 
oragers 


es | 


ies] 


P N 
ue Subsistence studie P-value 
ranking d 
Caste 
artior 
inane 
Pula | SDR | Tribe _[Foragers 
i 


Andhra Caste 


Pradesh 


Mala R 


B 
P. 
I 

P 


S 
S 
oo 
— 
S 


Qa 


FEES ESF 
BE 8 | 
Q a | 
se] |S) 
a] joie 
5) IP 16 
ge ga 


S 


S|o 
ne =) 
Nl 
Wie 
wl 


So 
oS 
lon 
N 


eSyeleiels 
Solo l]/ol|o 
NIN} IN 
Rep IN ID 
Sj [Lo [A fo 


oO 
Oo 
= 
— 


Sle Is 
o|o 
TN) 
S| 
~ 


SIs 
S 
N 
~~ 
oo 


astoral 


io) 


MWIn|InI|ro 
a a|5 
Be 
B15 
alo 
eas 
els 
ne} 
‘SS 
i= 


Caste 
aste 
Caste 
aste 
Tribe 


S 
oO 
= 
N 


ajalja 
& 
o 


riest/agriculture 
riest/agriculture 
griculture 
griculture 


ar) 


| Boro Kachori_ 


Afghanistan 


IE 
IE 
TB 
TB 
IE 
IE 


— 


Pakistan 


SISISISISISISIS I< lS lS leis 
BIWLO;OlelololTRlololol|o 
KE (OIWIOIBP IDA [MA Ve [AER 
DISIF[WO[SINJWHINID|Alolo 
AIISIALSISlolAlKRIQIOIKIE 


_ 
So 


ak S 
S 

z z 
=] lon 
i oO 


ilo! 


OA aaa an 
esl lesk less lesan ies] m)m 
ElEle|e x 
HALE LLL Ee 
NTT 


S 


NINININ][RIEINIGIS/Blalas R10 = 1a] 00] WwW] Ww] oo BLA lola} A sa} 
DITIEIEINIOLO[H/S]E[a]/ eo wis NYRI [Alo] e DLAI ALATA] NI} 
= 
BlEJePlPlolwlelel]i w plelelafelwolelelelrm[eleyelalrolefelelrtr|] & 
££ Oo 
5 
ols 
—) 
A 
oo 


Brahmin Utkalya 
Brahmin Assam 
as 
Chechen (Chechnya a 
East Chechen (Dagestan) Na 
Caucasus |Chechen (Ingushetia) 
Sindhi 
rahui 


0.0400 


Note: 

' TB: Tibeto Buraman language 
* TE: Indo European language 

> AA: Austro Asiatic 


* SDR: South Dravidian languages 
p values >0.05 were removed 


Figure 44a: Contour map showing the distribution of HG L3*-M357 in India based on its 
frequency 


Note: Populations with the sample size of atleat 5 were used for constructing the contour 
plots. High L3*-M357 frequency and variance was observed in Jammu, Himachal Pradesh, 
Punjab and Rajasthan in north India; Kerala and Karnataka in south India 


To estimate the relative distances of each study region from the median 
haplotype (the most probable ancient haplotype) using 17 YSTRs, SSD was 
calculated (Table 25a). Andhra Pradesh (N=5 only) was represented at the minimum 
distance but the distribution was sporadic over the spectrum. Tamil Nadu had 
haplotypes representing continuously from SSD distances 1-10. Karnataka showed 
continuous SSD distances from 1-3 and again from 5-9, at later SSDs the distribution 
was sporadic. Whereas, north Indian populations do not show the presence of median 
haplotype. Overall the results showed lower levels of evolution. 

The phylogenetic relationship among the study regions was deciphered in the 
network analysis (Fig 45). The network showed two branches, evolving in opposite 
directions from a hypothetical central node. Geography specific clusters with minimal 
gene flow between these two clusters were observed. Jammu, Himachal Pradesh and 
Punjab clustered together (cluster 1 in Fig 45). All the populations in this cluster were 
IE speaking populations. Whereas, cluster 2, was comprised mainly the samples from 
Deccan with an over-representation from Karnataka. 

4.3.2 Comparison of HG L3*-M357 with the global populations 

SSD values were calculated based on 9 YSTRs for the global populations with 
the data available from the literature (Table 25b). Higher proportion of samples from 
Deccan, Afghanistan and Pakistan regions gave the indication of the presence of 
ancient median haplotype. This was further reflected in the network analysis. Among 
North Indian populations, only Punjab showed continuous spectrum of SSD distances 
(1-6), whereas the others were distributed sporadically. 

The network analysis form literature data with 9 YSTR data showed the 
central node comprising populations from Himachal Pradesh, Punjab and Rajasthan. 


Surprisingly, no haplotypes from these study regions were shared with its 


83 


Os1e08 


‘suoreyndod Ocpcgpo ey Bikes 
URIPU] YON JO UOLNIOAS poje[Os! So}OJUSP YIOMJOU OU], Blair 
‘Aouonbay odAjojdey sy 0} feuonsOdoad st 9]9119 dy} JO BOIV (weiss 
‘SULSA LI Aq pojouap sodAjojdey sojouap apou yor :o}0N cp eiiieecs Bewes on 
Ceaees mring [7] 
’ Oyu, i wa jj 
oats wepug eapuy fi 
xagog BAT apy 
@zrees Ouse Bouin Seerkeue Bowery a 
SE puinven THN TL 


Oriwer Qa e Hs. 
Gapiuer Cfahetsnaees oat ®: BSS WUON | 


fo pepo — AByA 
OR eo; @bepang sonata qefung CT 
Baer Sanee 
ee Brorhy WSePMd THD ETH ie 
ser SnHy 
BHrives = ei yuar PPP OPEN unyppeley CL] 
© iii wosemEy |] 
Bouvven Ques @-evarg 
TESS C] 
Brrkeug Ss Tom | 
@ unveQsiroeg yy Bus ts7Ph 
abi rees ® BHERUES] Cc] 
@Ditirves ori 
Sipyiew =pi80ps C}deiswy cs 
ein’ Bog Oo Qeins CB Getores urs @ 
Gress « Gebi rues Beitey @psswee ‘ : 
| zsnjp be yypeg oe WpReteus ee td 
L belie Ser — hee @ooiung C)wiees tog ap 
BF sex veins —_ gS Be 
Ons 3 Be Merge ff 
Duster Oe EC go ue ee eS B Aree try 


ah 
Owergoe Smy 


(SALSA LI) suoyejndod uvipuy suowe ,¢7] OH 10J YAOMJoU JIUISO]|AYd UBIPIw pIdINpdy :Sf WINS 


VSSTYO YON 


a a a 


LC 


[owe [= [aera epAV 


ae 
6€ | TE | 67 | 87 (46 O07 | 61 | SI ST | ol | et | ZL or uolsay 


adAjo[dey uvIpow [v.19Udd WOA podUL}SIp 9.18 JV) SoTdures Jo 93vj}099.10g 


$a}¥j3s Apnjs snore suowe sdAjo[dey uviIpsu Woly poyndwod ,¢7] DH JO s9ouvysip TSS :BSZ IIqQUI, 


OO OO 


parpnys 
6£ 07 6r LI OT SI cl cL IT Or L S € (6 I N UOIsay 


adAjojdey uelpoul [v.4Udd UIOIJ PIdULISIP 9.18 JVY) So[duIUS Jo 99v}099.19g 


SUOISIA [VQO[S SNOLIvA SuOWe odAJO[deY ULIPIU WIOA, poyNnduo0d , ¢'T DH JO SauVsIP GSS :qSZ IQR 


geographical neighbours: Pakistan and Afghanistan (Fig 46, Table 26). The 
populations of Karnataka shared its haplotypes with populations of Afghanistan, 
Pakistan and East Caucasus. Jammu formed a distinct cluster with over-representation 
from a pastoral based unit of Himalayas- the Brokpa. The median haplotype was 
shared by Punjab, Himachal Pradesh and Rajasthan populations. The compositions of 
this median haplotype are presented in Table 26. The median haplotype was for 
YSTRs D389a, D389b, D390, D19, D391, D392, D393, D439 and D388 in cluster 
l(north India) was 13-16-22-15-10-14-12-12-11 and that of cluster 2 (Deccan, 
Paksitan, Afghanistan) was13-16-22-15-10-14-12-12-12. The north Indian and south 
Indian cluster differed at D388 locus by single step mutation. 

The Rst genetic distances of study regions along with data from literature were 
displayed by MDS plot (Fig 47). Most of the north Indian populations i.e., Punjab, 
Rajasthan, Himachal Pradesh and Jammu were seen clustered together and the Deccan 
populations formed a distinct cluster. Brahmin Havyaka, an IE speaker, Chechen — 
East Caucus populations and Rajasthan populations also clustered with the north 
Indian group. Gowda of Karnataka was close to Pastun of Afghanistan. 

The pooled ASD estimate of HG L3*-M357 in Indian populations was 13,376 
+ 3,181 for 17 YSTR loci. In India, higher variance and ASD was found in Himachal 
Pradesh populations (14,919 + 4,740). Taking into account the study regions and 
other neighbouring regions (9 YSTRs), Afghanistan showed highest ASD age (15,200 
+ 4,400). Himachal Pradesh of India showed the variance of 0.21 and ASD of 8,891 + 
3,880. The ASD ages were in South Indian study states were lower. But Kerala of 
Deccan India showed higher variance (0.42) and ASD (18,100 + 7,000) than the 
pooled YSTR variance and ASD of the haplogroup itself. This indicates the 


possibility of gene flow into this region. This estimate has to be looked more 


84 


‘OpoUu Yowd JvoU poUOTUSU JoquINU dy} 0} SuIpuodsaOS QZ 9]qQe} UI poUOTUDU dv Sopou dy} JO UOTISOdWIOD oY |, 
UOIJNJOAD payejOst MoYs suoTeiNdod ueIpu] YON ‘suone[ndod ueipuy yynos pure suoprejndod jeqo[s Suowe pdrrasqo useq sey 
sjuasoidas apou yoeRy ‘ION 


suLeys adAjoydey SAIsud}xd Jey} Pd}ou dq 0} SI 4] ‘AOuSNbay dy} 0} ;eUOIOdOId SI djoIID DY} JO BAIY LSA 


“—— @ue 
@d6uen ee 
wimg se 
OonY _— 
88 ze # Qe 
Cyibisw 
ig Qs @ rae 
_ €l OE @ieuex he 
Z daysnyy Oiave 
08 am & ”. pe Dever 
Qo vl e 

ELD 

(Aa 9¢ 

ures 
c¢ 
@aiboos 
Quer Gan go Ome 
. BE 
Qi0dg Oecoues 
LS 
—_ 
ty Meusison 8s 
pwassce 
(Ay P 
Deman [ Jaysn[D 
Oricug 
cs © 6F tr 
Yen Is hd ec Ly oF Cp  @ fues 
OB ujwer Oaimer @sbiwen 
8v 89 
cg miso Quix 


9 Ooisus 


(SLLSA 6) Suoyendod jeqojs pue suonejndod uvipuy Suowe LSCW-.¢7 DH 410F YAOMJIU IVIUISO]AYd UBIPIW PIdINpIyY 29 INS 


yeseing fj 

eee 'n i 
ysapedg exypuy fi 
eAejeysayy ia 
npen wwe. i 
vssuo'n [jj 
qefung Fj 
jeyoewiy [7] 
ueyjsefey [J] 
BsJSPIEYePy O 
wessy im 
nwwer ia 
eyejewey fj 
uejsyeg nos [ij 
ueysnyeg yon fj 
snseones jse3 fi 
ueysiueyiy fi 


Table 26: Composition of various populations represented in network Figure 46 


Node ; ; Node 
rio Region Population Count iid Region Population Count 


Himachal Pradesh] Brahmin Himachal Jammu ———_—([Brokpa Islam 
Punjab [Brahmin Punjab 24 |Jammu___[Dardi Muslim 
Rajasthan _[Brahmin Rajasthan Himachal Pradesh|Rajput Kinnura | 1 
Rajasthan _[Brahmin Saraswath 25 1 

4 
| [Himachal Pradesh|Gaddi Baddi 
Himachal Pradesh|Gaddi Rajput Jammu ————s«[ Bro pa Buddist 

; ; : 27 

Jammu______—*|Brokpatslam_ | 6_ 
Jammu ____|Dardi Muslim 
| 28 [Rajasthan 
Punjab [Sikh Majhbi | 30_|North Pakistan 
| 2 [North Kerala 3, |Kamataka___ | Adikarnataka 
3 fumitt__Sist et —__ Tamil Nadu 1 


: 
| 4 [Rajasthan [Charan | 2 
i 
1 
os [fom _end Musty _J_—_ 
| 52_[Jammu 

55 
| 60 [Karnataka [Brahmin Havyaka | 1 | 


ag 


h[Gaddi Rajput |__| 


nN 
sl 
B 
fe) 
> 
© 
se 
vw 
q 
2 
Q 
oO 
Dn 


12 

Rajasthan 

13 
Sikh Jatt 

Le ae | 


5 : : 
; 7 re 


Region 


0 
2 


ammu 
ammu 
ammu 


| 62 J 

| 63 [J 

| 64 [J 

| 65 [Rajasthan | 
| 66 | North Pakistan 
| 67 _| 

| 68 | 

| 69 | 
| 70 _| 


3 
5 
7 
7 


Tamil Nadu 
East Caucasus 


= 
eA 
> 
B 


0 imachal Pradesh 
u 
Era 
| 73 | 
| 76 \Jammu 


North Pakistan 
7 a 
a 
a 
a 
a 


6 
6 
64 
6 
6 
7 
7 


alra}r 
Bly 
Z| > | | | 
ae 

2 

a rm ae 


a 
lon 
pa 


i du 
i du 


79 


g, |Kamataka | 
8 


Andhra Pradesh 


Population 


Brokpa Buddist 
Brokpa Buddist 
Brokpa Buddist 
Jatt Rajasthan 
Kalash 
Tamil Jains 
Chechen (Chechnya 
Brahmin Goud R 
Rajput Himachal 
Sikh Jatt 
Brokpa Buddist 
Changspa 
Tajik 
Brahmin Havyaka 
Brokpa Buddist 

azara 
shtun 

athan 

rahmin Havyaka 
Brahmin Utkalya 
Brahmin Vadama 
Nadar Cape 
Settibalija 
Thiyya 


pe} 


Vanniyar 
Vanniyar NTN 


ac] 


gro 
nlD|D/= =| ee as 
oO S ‘ 
2g, 
pet) 


Burusho 
Brahmin Utkalya 
Muslim 
Balush 
Pashtun 
Burusho 
Balochi 
Adikarnataka 
Billava 
Gowda 
Namboodri 
Parayar 
Thiyya 
Chechen (Dagestan 
Adikarnataka 
Brahmin Goudsaras 
Brahmin Havyaka 
aniyani 
aliyan 
rahmin Utkalya 
ettibalija 


— 


| 
| 1 


Figure 47: Rst based MDS plot for HG L3* for Indian along with global 


populations for 9 YSTRs 


2 


imension 


D 


GB Afghansitan 
Assam 
East Caucasus 
Himachal Pradesh 
Jammu 
Karnataka 
Kerala 
North Orissa 

O Pakistan 
Punjab 
Rajasthan 
‘Tamil Nadu 


Note: The populations with the sample size of atleast 3 were considered for this 


analysis. The population codes are mentioned in appendix 12 


cautiously as the sample size from Kerala was only 7. More sample size from Kerala 
may throw further light on this issue. 

BATWING based phylogenetic tree (Fig 48) showed affinity of Pakistan and 
Afghanistan populations to South Indian populations. Jammu, Himachal Pradesh, 
Punjab and Rajasthan showed a distinct separate cluster a coalescence time of 18,789 
years. The TMRCA of all the studied states samples was 21,796 (21,251-23,010) and 
population expansion time was 19,914 (19,245-20,296) (Table 27). The evidences 
thus based on network, batwing etc., indicates an origin for HG L3*-M357 ~18,700 
Ybp and soon after formation two distinct migration took place from Afghanistan to 
Deccan India, presumably coastal route, and the other towards north Western and 
Northern frontiers of India. The most probable route was from East Caucasus 9,421 
Ybp (Fig 48) via Pakistan, Karnataka and branching from there to Kerala,Tamil Nadu 


on one hand 5,000 Ybp and Afghanistan and Andhra Pradesh on the other hand. 


85 


nunuer 
ueyyseley 
qefung 
OTL 
ysepergreyseuny 
OLL 
wessy 
snseanej seq 
1ZP6 
uEysDyed 
PrOL 
eye UIE 
zST9 
npenTurey 
SeOr 
pperay 
£eOr 
ysaperqerypuy 
6SS 
Tes 
ueysueysyy 
6SS 


LTL¢T 


T9L6 


Ts06 


L287 


798 
LsOT 


z60T 


vAY Ul sose JUdSoIdoI SIOqUUNU oY, :DION 


SOF 


STrLT 


cLOF 


L66L 


TLéT 


0°000¢ }—___ 


SULSA 6 SUIsn suolso1 Apnjs [eqo[s puv uvIpUy YIM LSCIA-.¢'1 DH 40J 99.19 INaUaSO[AYd paseq ONTIAALVA [8h eANsly 


uoyelndod neg popnjout sioyeods gf, x. 
ONIM.LVG Aq poyeuryss oom sou uorsuedxs uoyejndod pure VWOYILL ‘8zIs uonendod sanoayjo [eyssouy 
suoyeindod Apnys uerpuy wo ATUO pasn 910M SYLSA L] ‘“APMS Sty} Ul UOSTIedWIOd JO} posn pue d[GRIIBA 91OM SYLSA 6 ATUO :930N 


(96Z‘07-Sb761) FI66I a ae a aut) uorsuedxa uoHEndog 
(908°ZE-LPS‘ED) SEL‘OT (OL0'€Z-1SZ‘1Z) 96L‘1Z a ee ee ee ee 


same! dzIs uoleyndod eoagei Ayrurey =| porpnys ets 
asv BA DATPIOJJO [BISIIUY as asv BHPA ssunsuer’y] N wowed 


| SULSALE LI SULSA 6 
BJep 9.1N}.19}] PUv SUOISI1 ApN}s ULIPU] WOAJ SoJVUIISS OS" Poseg ONIA.LVG PUB GSV ‘OdULLIVA WLSA-x€'T OH 2L7Z PBL 


4.4 Origin and dispersal of NRY HG J2a*-M410 

The spread of HG J has been associated with agriculture. Hence to decipher 
the distribution pattern in India and its association with agriculture, I have studied HG 
J2a* and HG J2b which includes 329 HG J chromosomes from Genographic study 
populations of the total 5,033 samples studied. I have included 231 YSTR data from 6 
regions from literature data for comparative analysis and derive a holistic picture. The 
frequencies of the HG used from literature along with their references as mentioned in 
review of literature section. 
4.4.1 Phylogeography of HG J2a*-M410: 

Distribution of NRY HG J2a*-M410 among Indian populations is presented in 
Table 28. HG J2a* was distributed mostly in caste populations (5.8%) whereas in 
tribes it was found to be 1.7%. Linguistically IE speakers (6.2%, FET 7.E-18) in India 
(Table 29). Fig 49a represent the contour plots based on YHG frequencies of J2a* in 
various study states. Gujarat populations such as Maldhari (58%) and Patel (26.8%) 
showed high frequencies. >20% frequency was observed in four populations of 
Himachal Pradesh. The frequencies decrease towards northeast and Southern Deccan. 
The STR variance based contour plot (Fig 49b) showed two hot spots of high average 
STR variance at Uttar Pradesh and Maharashtra. High frequency (33.7%) and YSTR 
variance (0.70) in Maharashtra was attributed mainly by Parsee population. 


4.4.2: Phylogenetic network analysis: 


A phylogenetic reduced median network of 17 STRs of HG J2a*-M410 at pan 
India level (Fig 50a, 50b) revealed, median connecting link with reticulations but no 
samples. The populations were represented in the consecutive three tiers around this 


median. The innermost tier contained south Indian populations, the middle tier and the 


86 


[ar] 


| co-a'e | Lr10'0 ny 
aaa 


SEE pOORSUg| awe [ar | — arog 
I pase | at ueyjseley yndifey 
speyerumessy| easy [al [aon 
Se A 0S A 
-feeso0 er seo ean [ar | ease] uopseloy 
: smyNouse/pooysoug] asey [| ar | ueyssefey urmyerg 
TF se900 [1 aimnouepoowpseg| aise) | _al__| __ PRO UIE] 
5 i 
Pf foceoo [Bar pene aise arf regoeuyy and 
[feos Toreoo [+ [sor | amnoneyeioweal equi [al | ___ pep indie 
[Prony fersco [6 [ce [ammonite qeiowea] aq [al | ra eur 
-fiso00 09 toneonsomogt eqn [ar [asap etfng 
foto [te tonwonsomoatequn [ar fille] ysopesg 
fovea [sz any partion eat [at [peer peo] reyoeunyy 
[sore fresco [ar [ef ostiuisioans[aisey [ar [aint Raewog 
ooo [rs Bruny parrsem| ase ar pron 
coro [8c _airinonepoomsana| aise [ar | _Tegonuny orange 
[soo [ese armonegeoeal eqn [a 
[eras wor es amyonmypoowsend] ase) | al | __nuuey onEIe 
Tt ssroo [ee [amano poomsona] aise [al | ___ Bocr uranyeaey 


pez | onpea PZ Bias suppuel 


BIpUy UT UONNGLYsIp Aduanbayy OTPIA-:87f DH AUN ‘87 P98 


z= 
| 
= 
=a 
| 
| 
| 
| 
| 
= 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
[ae ae 
| 
| 
| 
| 
| 
| 
a 
az 
=a 
=a 
| fo 
| 
=a 
=F 


2) 
+ 
S 
a 
= 


ees a De ee 
ee a Ee a a ee 
| S| Sf ooo | | moneonseuio | Sar. eS) = | ______ PPO 
| | corre [octoo | tT me ST tse Ts | ere 
| | Sf ooo | ee | ST se | gs | P| 
ee ee ee ee ee ee ee 
[| J o000%0 [0 oor suey pee Ria] ese [as [pen 
ae a ee | 

| Tf oooo'o | es sero} ogy | was | OD 

| | foro | tT se Teo] og | GS | pea 

Witoo] eco’ || 06 | Budde Appotonmem| sey | was | BARAT 

jp | eco || 9s | amoneepoomsena| sera | ads. | eppoqumen] 

| Tf oezoro | Te Bey purriem] ase | Gs | HN] ere 
ee ee | 

rs a ee a | 

| espn | ts Brsta] arsed Gs | esd enesoW\/ 

a ae Sn a a ee ee | 

| | oero | | os | Sugg pump iom] awep | was | BARpOD 
000010 08 1) 90 2 | | | as | _ oe 

| | sore | eseco | zt | te | setmoneepoommend| see) | ar | A su ey 
|_| eo | een | | 8 | sae pee] ee | Gs | POD 

ae a ee ee ee | 

eee a DD ee ee 

a a Oe | 

ee a ee ee Ee a 

ae 2 
‘sloeoo| zi-az [cero | 62 | 98 | sopeay| aise | oar] STR 

ae a a Le a a 

ee ee eee 2 eee Ee eee ee ee 
001-00 OT 
ee ae ee ea ea 

en Ee ee ee eee 

| | corre | oBero |S | 82 | emymaponpoomsena| sem) | aL | asec me 

Pe Ea ee rr ee ee a 

peOE00 [| 6980702). |e | _ Seeey pey ) 2  AL) e, 
Pf eorr fesoco fr [ef Bommeypamic| asey [ar [ea] 


474 ee PAT Porpnys I! 
d0ud}sSISqn uoneindo uoIse 


a 
ay 
S 
S 
S 


= 
a 
= 
Slo 
S 


| 
Pewee | 
{p1-a'9 [00 
| 
| [sé 
= 
| 
| 
[zo-a's [$8 
| 
= 
| 
| foe 
| | 
joao | _¢ 
| 
a 
i 
| i s8 
| 
| Lb 
| sé 
| 
| 
| 
= 
— 
| ee 
| 
—— 
| fos 
a 
| 
= 
| 
= 
| 


2) 
+ 
Rd 
a 
iar) 


(OpeZf 10 VpeZl ‘,eZL) IOTTe OU} FO 97eIS SALIOp OJ SAIISOd soydures Jo JOqUINU dU} 0} SIOFOI +N 


es NE 0 


[Tito [sso aminaniiepoowsoua] aise) [a __[isaidnkieg ame} ysopeig 
Faso [2 [re amimonitepoowsanal ame [ ar | Tampemiey ure] eng 
[T8600 | 6 [26 | arantepoowisanig] aise [ar | seurumer uray 

ee am 2 

ee Smeentypoogsend| ase [ al [tee renee] rg 


| 
| 
| 
| 
| 
| 
| 
as 
| 
[aa ee 
| 
| 
| 
| 
== 
az 
=a 
=z 
z= 
oO] 
=z 
| 
| 
‘ol 
= 
= 


— ar po | a re TOG]sgE_| apr 
ee a 
Fess00 [eran poorisanrg) aise [ar [ep reer 
[feos [socoo [2 [96 Bunanyammnonty[ equ [ri 
focevo [eae eqn [nas | appaarepaoyy] hh 
Tf itcoo [1 9r tais worgiatniog| “equ aas | eeuueyepiosy] PV 
a 
Forse [eoreo [or [ze | aimmantepoomsaa]asey [at | epinercr unmet 


pez | onypea 2lZf porpnys suppuel 
Ss aS me asthe [PR] mmmmns | ates] een] ante 


1 
i" 
4 
+ 
Zz 


2) 
+ 
s 
a 
= 


sosensury ueuLIng O}OqLL GL 

SosensuUr] ULIPIAVIG YNOS YAS 

sosensury ueodoing opuy :q] 

sosensuryl UeIplAvig [eyued -ydo 

sosensury o1eVIsy ONsNY :VV 

»Zf DH Jouonsodoid 1sysry pomoys suoryerndod suryeods gy] 
0]0N 


BIpUyT JO SAdyveds 9SENSUL] SNOLIVA Ul OL PI-x8Zf DH AYN JO Sotpusnbasy 367 IQ" 


Figure 49a: Contour map showing the distribution of HG J2a* in India based on its 
frequency 


sw 
480000 


ig 2 4 : g14j)_— 12 cH p. -: s 
Figure 49b: Contour map based on YSTR variance of HG J2a* in India 


3234 3t 30 
Note: NRY HG frequency was found to be highest in populations of Gujarat whereas highest 


YSTR variance was found to be high in Uttar Pradesh and Maharashtra 


Figure 50: Reduced median phylogenetic network for HG J2a*-M410 for Indian 
populations based on 17 YSTRs 
A 

BB Karnataka 

CO Assam 

oO Rajasthan 

BB n.orissa 

LJ Bengal 

(J Uttar Pradesh 

1 Maharastra 

BB Andhra Pradesh 

() Himachal Pradesh 

C Binar 

(J Punjab 

DD Gujarat 

(J) Jammu 

HB Tamil Nadu 

(J Meghalaya 

[1 Manipur 

O Kerala 

(J Chattisgarh 

(J Haryana 


Populations from Deccan India and 
Gujarat are coloured 


B 
(J Karnataka 
GB Assam 
BB Rajasthan 
(1 n.orissA 
BB Bensal 
[ Uttar Pradesh 
(J Maharastra 
(J Andhra Pradesh 
Bl Himachal Pradesh 
Bi Binar 
BB Punjab 
C) Gujarat 
BB Jammu 
() Tamil Nadu 
BB Meghalaya 
Manipur 
C Kerala 
() Chattisgarh 
BB Haryana 


Populations from North Indian 
regions are coloured 
Note: It is to be noted that North Indian populations are mainly toward the periphey of the 
network and Deccan Indian populations are mainly concentrated in the inner circle 


last tier were represented by west Indian and north Indian populations showing 
multistep mutations and concerted evolution within the population. 
4.4.3 Comparison with Global populations: 

The absence of samples in the median in 17 STRs revealed that the absence of 
founder haplotype in India. To decipher this further and to find the relationships of the 
Indian haplotypes with the global populations, three approached were made: 
phylogenetic network, Multidimensional plot, ASD and BATWING to estimate ages 
using 8 YSTR loci. In the RM network (Fig 51a, 51b), there were two interconnecting 
median, one formed by 11 samples from Lebanon, one each from Palestine, Europe, 
Uttar Pradesh and Karnataka. A huge expansion of this node lead to samples from 
Himachal Pradesh, Europe and Maharashtra. The haplotypes were 13-16-23-14-10- 
11-12-15 for loci D389a, D389b, D390, D19, D391, D392, D393 and D388 
respectively. Whereas the other was comprised mainly of Mauria population from 
Bihar, branches out to many Middle Eastern, European and Indian populations. 
Haplotype sharing was observed between populations of Afghanistan and Indian 
populations, especially west and north India with stepwise mutations. 

The MDS plot computed for these samples was very striking (Stress value: 
15.29). At the resolution of 8 YSTRs, Jammu, Himachal Pradesh, Rajasthan and Uttar 
Pradesh populations were clearly separated from the rest of the populations studied by 
all the three dimensions, but Dimension | more conspicuously (Fig 52). The other 
populations strewn on the left also showed some clustering. Populations from 
Lebanon, east Asia and Europe lie in the middle of the other populations from 
Deccan. Similar picture was also obtained in the NJ tree computed using MEGA 


(Fig 53). 


87 


Figure 51: Reduced median phylogenetic network for HG J2a*-M410 based on 8 YSTRs 
A 


BB Afghanistan 
(DJ Andhra Pradesh 
CO Assarfr 

O Binar 

i East Asia 

BB Europe 

CO Gujarat 

CD Himachal Pradesh 
( Jammu 

CO Karnataka 

C) Kerala 

DD Lebanon 

(J Maharastra 

OJ North Orissa 
OB Pakistan 

DD Palestine 

CO Punjab 

( Rajasthan 

CD Tami Nadu 

() Uttar Pradesh 


Global regions are coloured 


B 


CD Afghanistan 

[By Andhra Pradesh 

(J Assam 

BB Biner 

CD East Asia 

O Europe 

BB Gujarat 

HB Himachal Pradesh @ 
0 Jammu 

BB Karnataka 

OO Kerala 

CD Lebanon 

DD Maharastra ° 
0 North Orissa a) 
CO Pakistan 

CO Palestine G) 

DD Punjab 

C1 Rajasthan 

Bi Tami Nadu 

BB uttar Pradesh 


ndian regions with 
requency > 5% are coloured 


Figure 52: Rst based MDS plot for HG J2a* for Indian and global populations for 8 
YSTRs 


@ Afghanistan Rowan 
Andhra Pradesh Maharastra 
Assam Middle East 
East Asia Sven Kerala 

A Europe Rajasthan 
Gujarat 0 South Pakistan 
Himachal Pradesh @ Tamil Nadu 
Jammu © Uttar Pradesh 


02 
N 00 
< 
>) 
7 
: ©, 
: 0 © 
5 0.2 — © 
Cha 
0.4 
0.20 
ae 0.20 sigs 02 9? 
Dimension 1 ogo 20 ens? 
pis 
Note: 


Circles indicate Indian populations. Squares, diamond and triangles represent global 
populations.The population codes are mentioned in Appendix 12. 

It is to be noted that North Indian populations showed distinct YSTR signatures whereas the 
South Indian populations clustered along with global populaitons. 


Figure. 53: NJ tree based on YSTR Rst distances with global populations with 8 YSTRs 


“Tet 


3 Ji Lables 
“yen ard SES Larus 
oe Vag: Apt oP es 
haeget Shoe slag 1 Maia shps 


(arrvere. Sivat se 


Sakray Bards at HE 


ST hen ce | 


Shiri Fara We A * Prana 1 


sEryil 13; anil as 2 


“heyy: Bea 
att ——— itera tee 4 


1 bape gar pela ciiiea 


Feerrnive.ayike aPeeerinbas 4 


7. ‘ 
+ Ka él 


Oy at cet 


“Ae HD RTAa Lee 


Fvtat cat Calbie kid: | tidliar 


Va nts Aba a ili 
Soke TEra ee Met ashe 


. od sarizion mks Karle 
See Soden “acanmka 


| Msn heal 


ea 


Perle Rud cleat at Beri Be Le | 


Sf Mike cH arene sc st ot 


Vaegiay 1, A4y'larislas 


co : | 0 Baernty sada pat 
Sires 


4) Brave hh 


Pn ean bbe 


tt amr re nich 
EPrsl an AarireckLit: . tba ron 
aby sane * CP 
ee ~ satire 


Salo 1 1b Supa at 


Wasi ae hace 


"1 0 te et ed Roe 
, “Palestine eed Eade ve 
Velen 3! Kemalass 
= Blerhin:) isa arn 
Seccuhahya; ardnarraeceP 
Lafefi far Mass 


LET enis nd pl esiie sP 
l . Tesh oriinrsbsa Wa est 
ea ei 
| Sharia Bae 


of lay go 
| bela, 
: ab eile 


Thee 4 aay BE ai Pash 


TO ea oe 


vet helenae 


Feds yd. 


“BhIkE 


2271 Fa 


2 OR Cee amen 


Mae Litai-"ra ns") 


oy 


Supa 4 Vachs Prades) 


Drstennbpsc ad: Beta Pees 
Se! brine Pacha akesk 
DeatrniiSa usar: $1 Let Se ae 
: Cu acd eHcnatat nttealy 


VF thin 


ca) TL ner: 


Pag mea Pct. odo ttre eee 1 


Tet MTD oa Pa taal 


ar 


Vou errs ben 


4.4.4: ASD and BATWING based age estimates: 


The YSTR based variance and ASD age estimates for 8 YSTRs and 17 YSTRs 
are listed in table 30. The YSTR based variance (both 8 and 17 STRs), showed a 
gradual decrease towards east and northeast India. When 8 YSTRs were considered 
within India, Andhra Pradesh, Maharashtra and Kamataka showed higher variance 
(0.62, 0.48 and 0.52 respectively) and ASD estimates (27,390 + 6Ybp, 20,758 + 
4,683Ybp and 21, 874 + 3,097Ybp respectively). Lebanon also showed high variance 
(0.53) as similar to that of Karnataka with ASD of 20,591+5,490Ybp. North Indian 
study states showed lower variance and ASD estimate. When 17 YSTRs were 
considered, Karnataka and Maharashtra showed higher variance (0.5) and ASD 
(~22.5Kya) consistently. Jammu and Afghanistan showed similar variance and ASD 
(0.6, ~25 Kya). Kerala and Bihar also showed higher variances and ASD. To test if 
the high variance in 8 YSTRs in these study states was due to gene inflow, mismatch 
distributions were calculated and represented in Fig 54 with their MPD values. The 
mismatch distribution of Himachal Pradesh, Maharashtra, Karnataka and Punjab 
showed unimodal peak with MPD values of 4.0, 4.9, 5.2 and 2.4 respectively, 
indicating recent demographic expansion in these regions. The mismatch distribution 
of Lebanon also showed a unimodal peak with a MPD of 4.9. Other regions showed 
relatively multimodal peak, indicative the possibility of gene inflow from various 
sources into these populations. 

The ancestral effective population size for NRY HG J2a*-M410 for all Indian 
study populations was calculated to be 37,780 (95 % CI: 25,698-56.378) and TMRCA 
40, 233 (95% CI: 27,338-61,408) with 17 YSTR resolution. The ancestral Na, 


effective population size, TMRCA and population expansion times based on 8 YSTR 


88 


Table 30: HG J2a*-M410 YSTR variance and ASD +/- SE in different regions 


po STRs CT STRs 


moe Tools] iso] || 
Paesine ——~| 30 | 070 26872] sos] | | _ 
Lebanon ———S—S~s 8 | oss] sao | | 
Afghenisan | _20 | oaa]_asiaf 7| | | 
Pakisin ——~—S*d is | 0] por] aa@] | | 
wt Asa ——S*sC os] itso] asf | | 
fmm ——————*iYt0| 02s] az08a[ 2.717] 0.60] 26.428] 1.600 
Punjab 9 | ona] tar7] arta | 029) 14200 | 5.027] 
TamilNada (| 28 | 0ae] 18.07| 7.23] 069] 0,100] 6800 
pom rt [ 9 [ol arsso[ [oz — at 
North Orisa =~ oa] i6.7s| avo] 047] 19,800) 2.600 
[UtarPradesh———~(| 17 | 0s] 8240] 043] _19.182[ 450] 
Bir ———*ds | 030) 14980] a6] 0] 25503] aca 
asm ——S—S*d 8s | oa] ser] aos] 039] isms] 2.901 
Allindia poked | 329 | 0.83 20625] arse] 0.0] 26.46] 420 
Pooled ASD all populations | 523 | 0499934] aai0f | ‘|_| 


Note: 

Var: Variance 

ASD: Average Squared Difference 

SE: Standard Error 

Only 8 YSTRs were analysed for data obtained from literatue for the purpose of comparison with the present 
study. 

17 YSTRs were used only from India study populations 

Regions with sample size of 5 and above were considered for the analysis 


Figure 54: Mismatch distributions based on 8 YSTR loci of HG J2a*-M410 among all 
study rregions 


Relaeve trequercy Relueve frequency 


Lebanon East Asia 


a | ws MPD : 4.992 ma MPD: 3.467 


Europe 
MPD: 3.583 


Relative trequency 


Pa lestine seuove neaueny 
wt | MPD : 5.752 ona 


osmear 
over 
oreo 


o.n3004 


oseere 


oar 


ontesz 


ney Afghanistan — Jammu 


Joao 7 MPD: 3.589 a MPD: 4.056 
: J. * 
2 
Jozws T = asco 
® 
Joxses 2 0.22500 
joie T 0.20000 
Joss — . 
Jonze 0.14000 ‘ 
4 oxo 
ostess sas 
1 
jose T 0.90000 
3 3 
ower T oor T 
2 
A 
A 4 
coun T be 
osm = T o—— 
° ° 
+ + 
H i 5 : z ‘ 2 ; e ' 2 2 ‘ * ‘ , . 


plamin iemnente Pakistan paestemes Himachal 
uae MPD: 3.867 eal | MPD: 4.051 


eas 


2 
26 
onrare 
ozs 
onsen 
ezeert 
0.93906 
e190%0 
one 
onsen 
“ 
2 0.09033 
eszesr J 
oor 
sore 
0.06900 
eormes ? 
‘ * 
‘ oosers 
enstas 
oo1se7 
eors7t 
‘ 
a 4 5 ‘ ? a 


Figure 54: Mismatch distributions based on 8 YSTR loci of HG J2a*-M410 among all 
study rregions 


Relaeve trequercy Relueve frequency 


Lebanon East Asia 


a | ws MPD : 4.992 ma MPD: 3.467 


Europe 
MPD: 3.583 


Relative trequency 


Pa lestine seuove neaueny 
wt | MPD : 5.752 ona 


osmear 
over 
oreo 


o.n3004 


oseere 


oar 


ontesz 


ney Afghanistan — Jammu 


Joao 7 MPD: 3.589 a MPD: 4.056 
: J. * 
2 
Jozws T = asco 
® 
Joxses 2 0.22500 
joie T 0.20000 
Joss — . 
Jonze 0.14000 ‘ 
4 oxo 
ostess sas 
1 
jose T 0.90000 
3 3 
ower T oor T 
2 
A 
A 4 
coun T be 
osm = T o—— 
° ° 
+ + 
H i 5 : z ‘ 2 ; e ' 2 2 ‘ * ‘ , . 


plamin iemnente Pakistan paestemes Himachal 
uae MPD: 3.867 eal | MPD: 4.051 


eas 


2 
26 
onrare 
ozs 
onsen 
ezeert 
0.93906 
e190%0 
one 
onsen 
“ 
2 0.09033 
eszesr J 
oor 
sore 
0.06900 
eormes ? 
‘ * 
‘ oosers 
enstas 
oo1se7 
eors7t 
‘ 
a 4 5 ‘ ? a 


Reuatree frequency 


oats 
oss 
oun T 
oso 
o2esr T 
o2msr T 
oro oT 
ove oT 
onan T 
oor 
1 
ooxes0 ii 


Figure 54 cont 
North Orissa 


MPD: 5.238 


8 
6 
3 
2 
1 

° ° | 1 | i 

+ + + + 

1 2 3 4 5 . 7 s s 


Retatree frequency 


omer T 


Uttar Pradesh 
MPD: 4.603 


a 
x 
6 
5 

” 

9 
5 
i al | 
5 ie : 
3 ‘ 5 6 7 2 a 0 


oars 
oseer 7 
osu 
osome 
or T 2» 


e207 


om T 
osses 
onset 


o.07es2 


coms = T 


Relative frequency 


ozo = T 


Bihar 
MPD: 3.410 


Assam 
MPD: 4.214 


loci (Table 31), all were the highest in Lebanon, suggesting Lebanon as the most 
probable homeland of HG J2a*-M410. 

Only a few clades of J2a were seen: HGs J2a4a-M47 and HG J2a4c-M68 
subtypes of HG J2a*-M410 were identified sporadically. It was interesting to note 
that these HGs were mainly present towards Deccan India. HG J2a4a was specific to 
Parsee populations of Maharashtra (9.3%, FET 5.E-11). The average variance of these 
haplotypes was 0.1 with ASD age of 5,062 + 2,743 Ybp. NRY HG J2a4c-M68 was 
present in high frequency among Nilgiri hill tribe of Tamil Nadu- Thoda (50%, 
FET6.E-14) and in a lesser frequency in agriculture based tribe of Karnataka-Yerava 
(9%, FET 6.E-06). Yerava showed an average YSTR variance of 0.48 with an ASD 
based age of 20,887 + 5,900 Ybp. Thoda showed the YSTR variance of 0.10 and ASD 
4,795+1,841Ybp. This fits well with the oral history of migrations of Thodas 
(Arunkumar et al., 2012). Presumably they were the isolates of these migrations from 


Middle East, arriving early before the local evolution and caste formation. 


89 


(008°6€-SLr‘01) OSZ‘0Z 
(LOI€€-8LL‘r) 907ZI 
(9pL‘6E-O1EY) 166'EI 
(zss‘S€-Z19°2) PEE TI 
(16Z‘0E-L1Z9) 60€EI 
(Zpy'8s-19'Z1) 196°SZ 
(ZZL‘ES-189°6) 896°7Z 


out} 
uolsuvdxe uonepndog 


(000°88-SLL‘LI) SZL‘SE 

(STOL‘9-ETTI'D POOH | (1S9‘9L‘0I-950°67) STL‘60'T 
(L90°09-LE8‘01) 8E6°ET (O0€“0€-b78) 9PT*E 
(TEL‘9b-191'8) ELE‘ST (g8z‘s9'I-S90°Z1) ZEO'LE 
(r0°6S‘S-O9'TT) STS*ET | (616'°E8‘61-766'0L) 6901 S‘°Z 


(p8r‘LZ‘1-606°07) S90‘SP (9€9°€I-€95‘S) POH‘OT 


(SOL‘vE‘I-100°bZ) Z6H‘TS | (CO0‘OrOT-OZS‘LH'T) Tp9OSTS‘E 


suonendod Aavsoduisj0ZD 
JO 938WHX9 ONIM CVE 


(SpOZ-EST) TBS 
(Z@pI-SEL) SEp 
(ILST-0ST) 6LP 
(6721-671) OOF 
(€LEI-ZED) TIv 
(Lsv‘7-ZE7) STL 


(OPLZ-497) TS8 


(8X) 92Z1s uoneindod 
JAIJIIJJO [V.1SIIUY 


parpnys 
N 


UOISIY 


SULSA 8 10F OTPIA 
-,87f DH Jo SisdAjvue ONIM LV UO poseq sou uolsuvdxes uonendod pue VOYIALL ‘S2ZIs uoneindod sans94Jq 2 TE AGU], 


4.5 Studies on NRY HG J2b-M221/M102 
In the present study 266 chromosomes of HG J2b-M221/102 were studied of 
the total 4,394 samples. This study includes 57 YSTR data obtained from literature 
for the purpose of comparison of the study populations. The frequencies of the 
populations studied from literature along with their references are mentioned in 
review of literature section. 


4.5.1 Phylogeography of HG J2b-M221/102 in India: 


Higher and appreciable frequencies of HG J2b (~ 20%) were seen in majority 
of the populations of Himachal Pradesh, Punjab and Rajasthan (Table 32). 
Nonetheless, almost all the hunter gatherer populations of Tamil Nadu, many of them 
hill tribes identified in Sangam literatures, showed appreciable (~10%) frequencies of 
this HG. Overall this clade is seen in very low frequencies throughout India (Fig 55a, 
55b), indicating their pervasiveness or remnance. Contour plot based on the 17 YSTR 
variance revealed high YSTR variances among the populations of Jammu and tip of 
Southern India. Linguistically, this HG did not show any affinity with any of the 
language family. 


4.5.2: Phylogenetic network analysis: 


The reduced median network of HG J2b-M221/102 (Fig 56) showed a 
hypothetical median / central node with clear radiating branches in all the directions. 
The 3 to 5 tiers in the network showed many samples most from Deccan and Gujarat. 
Select populations from Northern states of India, particularly Himachal Pradesh, 
Rajasthan and Punjab i.e., the IE speaking belt were all seen evolving mostly in 12 0 
clock, 3 0’ clock and 10 o’clock positions at the terminal and peripheral outer layers 


of the network. 


90 


Table 32: NRY HG J2b-M221/102 frequency distribution in India 


aie Social 
bsist 2 
ar DO een _sunsistene ted fs] | ate 


| Caste [Wetland farming/Warrior | _20 | 2 [0.1000] ___| 
Brahmin Gadi | TE [cane [Wet and faring | a2 [2 o0ano] 
Gadi | SE | Tribe [Wetlandfarming | 6__| 1 [0.1667] _| 
GaddiHalli__—— | TE | Tribe [Wetland farming/Warrior | __5__| 1 | 0.2000] _| 
Gaddi Sippi_ | TE | Tribe [Wetland farming/Warrior | _8__| 2 | 0.2500] __| 
Gujar_ | TE | Caste [Wetland farming/Warrior | _41_| 3 | 0.0732|__| 
RajputGaddi__—— | TE Tribe [Wetland farming/Warrior | 108 | 8 | 0.0741] _| 
Brahmin Punjab |_TE | Caste [Brahmin | 40_—| 1 [0.0250] 
reise fete sm ffs Tot — 
Sikh | CE | Caste [Wetland farmin —|—8_| 2 [0.2500] 
Banya S| «TE | Caste [Dryland farming | 5_| 2 | 0.4000] __— 
BrahminGoud _____—|_-TE_| Caste _[Priesthood/agriculture_| 16 | 1 | 0.0625] _| 
Charan | E | Caste [Dryland farming | 66_| 11 | 0.1667] 2.-03| 
Rajasthan|Jain Raj | TE | Caste_[Drytand farming | 56_ | 7 | o.t2so| | 
Meena | SE | Tribe [Dryland farming | '58_| 5 | 0.0862] 
Caste [Artisan | | 2 | gis] 
Rajput Rajasthan | TE_ | Caste_[Dry land farming/Warrior | _36_| 2 | 0.0556] __| 
Brahmin Kutchi__—|_TE_| Caste [Priesthood | 16 1 | 0.0625| 
Gujarat Koi || Caste [Fishing | 19 | 3 fots79] 
ew [Rotwalia [TE [Tribe [Artisan Tos 2 [0.094 
Vasava | SE | Tribe [Foragers | 38 | 7 ou] | 
Dhangar | SE | Caste [Pastoral S| 40_—| 1 [0.0250] 
GondM | CDR | Tribe [Dryland farming | 37_| 1 [0.0270] 
MaharastrSatkari_— | TE | Tribe [Artisan S| 56_| 1 | 0.0179] 
Korktu | CAA | Tribe [Foragers | 40—| 2 [0.0500] 
Caste [Trade | 86 | 3 | 0.0349] 
Wari | SE | Tribe [Dryland farming | 44_| 2 [0.0455] 


Caste 
Caste 
Tribe 
Tribe |Foragers 

Caste 
Caste 
Caste 


co 
N 
s 


Tamil 
Nadu 


< 
<) 
— 
< 
oO 


e|o|> 
So|—|— 
NININ 
lA] 
al a 
> 
. 
o 
N 


So 
oO 
No 
— 


S|els 
So 
SS) 
| 
S 
N : 
ra 
S 
SS 


Nn 
ie 
oO 
i) 


aste_[Wet land farming 7 
Wet land farming 


Brahmin Utkalya 


Ss 
Binjhal 2 
Ss 


SISISIS IS lS iS lS lf IS IL IL Is 
Slelolsl[sl[slel[elelHlole 
Nl[AlHlO [B/S lAlelolwl/=alo 
RIS|SIS|—|SIS| AM] a0] | od 
SIS[OIR[wloOlA[RIKR] OAS 


oO 


SIs 
oS 
a 
oo 
ma 


So 


SIs 
—) 
no 
nN 
Nv 


Oo 


Al 
S 
HK 
& 
Nn 
ae at a : 
Pe |) ine 
Solo S 
StS S 


S 


S 
oS 
a 
0 
0 


S 


Figure 55a: Contour map showing the distribution of HG J2b in India based on its fre- 
quency 


Note: HG Db M2217 102 was seen in low frequencies in India. YSTR variance was high in 
Himachal Pradesh and Southern tip of India 


Figure 56: Reduced median phylogenetic network for HG J2b in Indian study 
populations (17 YSTRs) 


A Oo 
(J Arunachal Pradesh O O83 
oO Assam OO 
CL) Rajasthan TBO 
IB Karnataka © 
HB Gujarat ef O 
(J Himachal Pradesh die 8 
() Punjab 
(0) uttar Pradesh da oO ® oO 


Oo North Orissa ® 

HB Tami Nadu ee & es ee) To he <e O o 

CJ Bengal oO ® Q Q 
BB) Andhra Pradesh 
Maharastra 


(J Jammu res 
CO Bihar 2 
Manipur 
CJ Chattisgarh O 
6 North Orissa 
id oO 
e Oo 


opulations from Deccan India and 
ujarat are coloured 


B 


O Arunachal Pradesh oo 

C) Assam 3 @ 

[BB Rajasthan ® 

( Karnataka Oo @ oO 

CO Gujarat 8 To 

Bl Himachal o.0 

BB Punjab és e 
Uttar Pradesh 


5 ae Orissa ¢} Oo 8 Z 
‘amil Nadu 

() Bengal 

(] Andhra Pradesh 

([] Maharastra 
BB Jammu 

IB Bihar 

oO Manipur 

() Chattisgarh 
( North Kerala 


opulations from North India are 
oloured 


ote: Each circle denotes haplotypes represented by . Size of the circle 1s proportional to the haplotype 
frequency. South Indian populaitons are represented in the inner circle all around the central hypothetical node. 
North Indian populaitons are represented mainly in 12’0 clock, 3’o0 clock nad 9’o clock position 


4.5.3 Comparison of HG J2b with global populations 


Considering 8 STR haplotype of J2b of the present study along with those 
available in literature revealed some interesting observations. The network in Fig 57 
showed a median node. The YSTRs of the median node was 12-16-24-15-10-11-12- 
15 for YSTRs D389a, D389b, D390, D19, D391, D392, D393 and D388. The median 
haplotype was shared by 5 Europe (Ashkenazi), 10 Tamil Nadu, 8 Rajasthan, 5 
Greece, 3 Andhra Pradesh and Punjab, 2 from Assam and Karnataka, 1 each from 
Uttar Pradesh, Cyprus, Arunachal Pradesh, Gujarat, Maharashtra, Lebanon, Himachal 
Pradesh, Palestine and West Eurasia. Further there were many radiating and 
expanding branches at various tiers of evolution, but most of the nodes consisting 
samples from both India and other regions studied. This haplotype sharing suggesting 
recent common ancestors and extensive geographical spread of this HG in India and 
nearby regions. 

Two dimensional MDS plot of the data gave a stress value of 20.27 and hence 
the dimensions were increased to 3 and this reduced the stress value to 13.29. 
However, the picture was not as impressive as that of with J2a (Fig 58). The Northern 
Indian, particularly the Himachal and nearby region samples and the middle eastern 
and European populations appear in the lower half, discriminated by Dimension 2. 
The NJ tree computed based on the Rst distances (Fig 59) shows that the affinity of 
north Indian populations to global populations than to Deccan region. 

4.5.4 ASD and BATWING based age estimates: 

In calculating variance, Palestine populations showed higher variance (0.28) 
and ASD of 14,153 + 6,430 Ybp (Table 33). Among Indian study states, Karnataka, 
Tamil Nadu, Maharashtra and Himachal Pradesh showed high STR based variance 


based on both 8 and 17 YSTRs. Both Himachal Pradesh and Tamil Nadu showed an 


91 


Figure 57: Reduced median phylogenetic network for HG J2b for Indian and global 
populations (8 YSTRs) 


Bi Spain 

B East Caucasus 
DO Lebanon 

D Middle East 
O India 

B Turkey 

B East Asia 

B West Eurasia 
D West Caucasus 
D Greece 

Bi North Africa 
B Europe 


Populations from different regions of %& 


India are coloured 


Figure 58: Rst based MDS plot for HG J2b for Indian and global populations using 


8YSTRs 


2 


imension 


D 


Karnataka 
Maharastra 


Middle East 
Punjab 
Rajasthan 
Tamil Nadu 


Turkey 
Olttar Pradesh 


Bun 


NadC Paly 


ChrM 


Bsar 
NCV 


0.0 


pimension 


Note: Circles represent Indian populaitons. Diamonds and triangles represent global 


populaitons. Population codes are represented in Appendix 16. 


Figure.59: NJ tree based on YSTR- HGJ2b-M221/102 Rst distances with Indian 
and global populations for 8 YSTRs 


Paliyanta-Tamiladu 
MSikhMainbitél)- Punjab 


Bunis(5i-Kamnataka 
eat. Madar Cape(aik Tamil veacti 


Mirae (3p arr ilMeach 


55 avec? )-Gujarat 
“1 Versys) Kamataka 


Chratian Mawrunite{13) Lebanon 


wb 


Satnan(3)-Chaltiagarh 
Sethualyjal/-Aandhral adesn 


aa 


—— Manaveeraco)-Kariataka 


writ 


Farayar(é:-TarnilNadu 
“3 GalotS)-Arunac 1alPradesh 


— MeenafS)-Rajasthan 


GujjiariZ-H machalPradesh 


—_ | Fa estinan{hPPalestine 


meen ¢ 


7, P Eznavey 4} Tarihadu 

 Aahkonazif] 31-Eurece 

‘| Brahmingaryupareent4}) UttarPradesh 
oli 3} Gujarat 

Mala? )-AndhAF rades.4 


:7 Chara(11)-Rayasihan 


5 Grarrnini limachal{S+ fimachalPradesh 
on Cypicte(5-Cy pus 
pee Lyguint'3)-China 


MagigafS}-Andhrarradesh 


Kann ata)-4 idfiratracesh 


Greece) 3)-Greeca 


an "Jain Saj(7)-Rajastian 
a «adar(5) TamilNad. 
~ NaltukaihaiChetlian Peiyavak uppucl do) TamiNadu 
Ratu On Cela Wearever Lar Nado 
GaddiRa)pull4)-Himachalbradesh 
— Braheintaoudsaraswathi3)}-arnataka 
RajpitHimachalt 1 3iHimachalPradesh 
Turk shit)-Turkey 
Pasee(3)-Maharas ira 


Pulayar(64-Tarrulbladu 


Billaval +} Kawniataka 
Gadd Baddif4)-HimachalP radesh 


ee 


Pioravaréé)- Tani Mau 


Table 33: NRY HG J2b-M221/201 ASD based age estimates in different geographical regions 


17 YSTRs 


<a 
bes 
02s) 9405 [2a6r[ 
[or] easel anesp 

| 
. 


ie] 
, 
| 


. 
| 
, 
| 
i 


Note: 

Var: Variance 

ASD: Average Squared Difference 

SE: Standard Error 

Only 8 YSTRs were analysed for data obtained from literatue for the purpose of comparison with the present 
17 YSTRs were used only from India study populations 


19,241 


ele ele etelu ie) 
o;]e co]; O co;N!|T AN N 
co} s# co | GO \o | 1 | S 
salu NIA M | SO | oO N 
‘Oo | be Wl anlar Ww 


ASD of ~20 Kya with 17 YSTR. But only Rajasthan and Tamil Nadu populations 
showed a unimodal peak in their mismatch distribution for with MPD of 2.7 and 3.6 
suggesting recent expansion among in these regions (Fig 60). But the YSTR variance 
(0.21) and ASD (8,907+1,460) was lower in Rajasthan than Tamil Nadu. Palestine did 
not show a smooth unimodal peak, so the higher variance could be attributed to gene 
flow also 

The BATWING analysis for 17 YSTRs for all Indian HG J2b populations, the 
effective ancestral population size was calculated to be 4,358 (95% CI 28,575-6,758) 
and TMRCA 30,952 (95% CI 20,598-47,345). 5 sets of BATWING computations 
were further carried out for each region. Lebanon showed the highest ancestral 
effective population size and TMRCA (Table 34). But the population expansion time 
was higher in India (15,431: 95 CI: 8,158-30,912). Thus the overall results indicate 


long geographical spread of this HG and had expanded for a longer time in India. 


a2 


Figure 60: Mismatch distribution based on 8 YSTR loci of HG J2b-M221/102 in all study states 


Relative frequency 


ozrsa 


ozs 


020629 


Lebanon 


MPD: 3.170 


Palestine 


MPD: 3.393 


Retatve fr 


osnts 


028073 


Cyprus 
MPD: 3.000 


Europe 


MPD: 2.225 


Perey Himachal 
ose MPD: 3.868 


onz208 
0.90769 
08231 


ores 


Punjab 
MPD: 2.667 


s ‘ 
e2m T 

on T 

anos T 

ener T 

3 3 

ea2es 

eri985 

e0n624 

aortas 

‘ ‘ 

eure 

e076 

o 
‘ 2 3 ‘ s ‘ 7 


Rajasthan 
MPD: 2.761 


zara 


e078 


eum 


esz8 


soe? 


Gujarat 
MPD: 2.769 


ses 


093333 


Figure 60 cont 


Maharashtra 


Rettree frequency 


= MPD: 4.244 


‘ 
3 
| ie 
5 ‘ 7 8 


Relative frequency 


on7i89 


ns122 


ontset 


Karnataka 
T MPD: 4.698 


Tamil Nadu 
= MPD: 3.651 


02176 


osesne 


oxeart 


onaite 


snes 


onoes2 


07089 


nares 


002383 


Andhra Pradesh 
I 2 MPD: 3.574 


+ “ 


+0 


North Orissa 


Relative frequency 


ost 


Ozer 


02st 


o2zss7 


0.20000 


ones 


0.14286 


onsaze 


ooesrs 


oosres 


ones? 


MPD: 3.905 


6 
5 
4 ‘ ‘ 
1 ‘ 1 
° ° i a 
r r 
1 2 3 4 . “ ? 


Uttar Pradesh 


Relatve frequency 


o.seser 


0.33333 


MPD: 3.667 


Assam 


MPD: 3.600 


Table 34: Effective population sizes, TMRCA and population expansion 
times based on BATWING analysis for HG J2b-M221/102 (8 YSTRs) 


EECOINE Population expansion 
Region population size TMRCA 4 : P 
time 
(Na) 


373 (114-1302) 17,554 (6,643-49,417) 10,011 (1,746-35,899) 


68,012 (28,571-1,78,830) 8,376 (1442-42,935) 


The ages were computed using different BATWING runs and the results need to be 
interpreted with great caution 


4.6: Nattukottai Chettiar- A case study on caste formation 

In terms of genetics, caste can be defined as an inbreeding unit. South India is 
characterized by a rigid caste system and endogamy. The formation of these caste 
units may not be uniform across India. However to investigate the formation of caste 
based on Y chromosome, I investigated the community of Nattukottai Chettiar (NC) 
to study their NRY profile. The ethnographic notes are presented in Table le. The 
community is divided into 9 patrilineal clans or Kovils and sub-clans. Each clan being 
exogamous but practise caste endogamy. Fig 61 shows the distribution of various 
Kovils of NC in Chettinad region of Tamil Nadu. 


4.6.1 NRY HG frequency distribution in various sub-clans of NC 


Table 35, 36 shows the Y HG frequency and Fisher’s p value based on clan 
and sub-clans. It was interesting to note that some YHGs are specific to only certain 
clans. Mathur _Arumbakur showed 100% of HG F*-M89 (FET 1.E-11). HG Hla*- 
M82 showed 100% representation in Mathur Uraiyur (FET 1.E-08) and Mathur* (* 
refers to unclassified clan) (FET 2.E-01). HG J2a*-M410 shows 100% in 
Elayatankudi_Okkur (FET 7.9E-09). HG J2b-M221/M102 was mainly localised in 
Vairavan clan. Mathur Manalur, Elayathankudi and Surakudi mainly showed HG L1- 
M27/76. HG O2a-M95 was present in Mathur Kulathur in a frequency of 100% 
(FET 3.6E-08). Similarly HG  Rlala-M17 ~~ was _ represented in 
Elayathankudi_Kalanivasal and Erani Kovil. Whereas Mathur Karupur and Nemam 
possessed HG R2-M124. The YSNP Nei gene diversity was thus nil for many of the 
clans. 


4.6.2 Neighbour Joining tree computations 


NJ trees were computed based on both YSNP Fst and YSTR Rst distances 


(Fig 62a, 62b). Three major clusters were observed in NJ tree based on Fst distances. 


93 


suepenbpeosH ‘}siq 
suopenbpesy ye, sojdwe) uelD 


reyenen erm FE 


JINVYOVAIS ” 
| navid eseyepun [ reyenen eiow FE 
ampedde [I] snneddeo~w [EE 
eyenearmuour [__] ss rexeneasrueunnea [__| 
GN3937 GVNILLSHO 
Pd AM AWYSOMPINULL 
ZT Ns 


QVNILLSAHD 


NQVN TAVL 


S[PLAOY SNOLIVA JO SUOTZVIOT SUIMOYS NpeN [We] UI vale peUTnIYD Jo dey] :[9 ansiy 


\o va) 
coe — 
Oo oO 


cc'0 
$0°0 
$00 
810 
S10 


€8°0 
$60 
$80 
00'T 


L9°0 
CT 0 
860 
€8°0 
L9°0 


bY tb 
oe oe 


N 
N 
Oo 


Te 


EE 


L10 
$0°0 
cc'0 
1€0 


vl 98°0 


X 7 =a 
ea) Ne} S 
ea) a vay 


Oo 
= 
S 


= 
S 
Ss 
— 


+ 
N 
— 
= 


BEE 
nN 


uonRIASp prlepurls -dS 
J] quIIeAe JOU eyep‘polfissepoun,, 


gINYLYL INLAY 


21 S 
So 
vay 


ane 
N 
—_ 

2 

S 

oO 

boa | 


S 
o 
_ 


So co 
va) ioe} 


as 
foe) 
Slr 


oor} | [9 | MyFO Iphyuewedreyy 
Pf |g | aekeprmprunury ipmyueqedrera 
Lanka [eseAluejeyy IpnyueyyeAreyy 


Injeuryy Inyyeypy 
Ine nye] 
€ iInddnieyy Inyyeyy 


[oor] TART 
x@TH 
69N 


4 So Wa) 
S } 
S N 
= - 


S 
Oo 
4 


||: 


i) ba 

: be 

T+ oo 
S 
S 
+ 


peIpnys 
eze ll yy | TeOUD FHONNBEN Jo UeTD ang 


<<< Joysew dNS AUN 


= 
S 


OIVN 


— 
~ 
— 
N 
N 


W 


1vI}V0YZD IejoynyeN Jo suvpd-qns oy} suowe squ) AduaNbay OH AYN °SE AGB 


,IphyueyyeAreyy ipnyueyyedAreylq 
, URARIIEA UPARIIC A. 


nddnyeareuekeyy ueAesie A 


FS € 
| ee oa ae Rem 


ediereARl ig 


Injeueyy Nye 
IMyeENy nye 
Inddnieyy Inyyeypy 


a 


| vet | LIW | LOC | 96N_J9L/LCWLOI/IZ7W OVW | 69W | 68IN_| 


IVHPIYD [eyOYyN}VwN Jo suvpo-qns yuasaypip ul sauanbaay HH Jo AyPIquyjaa oy} urEz199Sv 0} (LAA) SIL, WExXY JYsLy JO synsay :9¢ qe], 


Figure 62a: NJ tree based on NRY HG Fst distaces for subclans of Nattukottai Chettiar 


0.2281 = Mathur Karuppur(3) 
O92 Nemam(10) 
O.44s8 Elaiyathankudi Okkur(6) 


Vairavan Vairavan*(8) 


0.0931 


on 0.3221 Vairavan Periyavakuppu(16) 
0.0704 .— Vairavan Thayanarvakuppu(3) 
a = Mathur Arumbakkur(7) 
= Mathur Kulathur(4) 
i = Elaiyathankudi Kalanivasal(9) 
Erani(19) 
0.2283 Pillayararpatti(25) 
aarfe Mathur Manalur(11) 
S29 oop | Surakudi(7) 
Elaiyathankudi Perumaruthurudayar(4) 
_ Elaiyathankudi Kinkinikkurudayar(3) 
Elaiyathankudi Pattinasamy(8) 


ost} Mathur Uraiyur(9) 


0.0428 


Ilupakudi(12) 


-——4 
0.1 


Figure 62’b: NJ tree based on NRY STR Rst distances for subclans of Nattukotai Chet- 
tiar 


O.1S41 


Elaiyathankudi Kinkinikkurudayar(3) 


0.0706 


Erani(17) 


Pillayararpatti(23) 
“= Surakudi(5) 
Elaiyathankudi Perumaruthurudayar(4) 


0.0653 


0.1781 


O03 0.2705 


0.0547 


llupakudi(11) 


Mathur Manalur(10) 

Mathur Kulathur(4) 
Mathur Karuppur(3) 
Mathur Arumbakkur(7) 
Elaiyathankudi Kalanivasal(9) 
Mathur Uraiyur(9) 


04719 


0.0978 


Elaiyathankudi Pattinasamy(8) 


0.4087 


Elaiyathankudi Okkur(6) 
Nemam(8) 


bog2t 


Vairavan Periyavakuppu(13) 


0.1715 


Vairavan Thayanarvakuppu(2) 


0.0805 


0.0926 


Vairavan Vairavan*(8) 

0.1 
Note: The numbers in the bracket () indicate the sample size. The branch length indi- 
cate the genetic distance. 


Cluster 1 was characterised by Mathur_Karupur and Nemam having high HG R2-124. 
Cluster 2 was characterised by Vairavan Kovil and its subclans having high J2b- 
M221/102. The cluster 3 included the Erani, Pillayarpatti, Surakudi, sub clans of 
Mathur Kovil (Manalur and Uraiyur) and sub clans of Elayathankudi Kovil 
(Perumathurudayar, Kinkinikuradayar and Pattinasamy) mainly possessing HG LI- 
M27/76. Whereas Illupakudi, Mathur Arumbakur, Mathur Kulathur and 
Elayathankudi_Kalanivasal stood distinct. 

NJ tree based on YSTR Rst distances gave a different picture with all the 
Mathur subclans clustered together with the minimal distances from Elayathankudi 
subclans, Pillayarpatti, Surakudi and Illupakudi cluster. The Vairavan formed a 
distinct cluster as similar to NJ tree based on YSNP Fst distances. 


4.6.3 Phylogenetic Network and mismatch distribution analysis: 


Reduced median phylogenetic network HG F*-M89 was mainly represented in 
Arumbakur showing single step mutations. The mismatch distribution showed a 
unimodal peak with MPD of 1.5 (Fig 63a). In HG H1la*-M82 two distinct nuclei can 
be identified in this network. YSTR sharing was observed between Illupakudi and 
Elayathankudi_Pattinasamy in one of the nuclei. The other nucleus was 
overrepresented by Mathur Uraiyur. The mismatch distribution showed multimodal 
peaks with MPD value of 3.89 (Fig 63b). This indicates the diverse sources of HG 
H1la* among these populations. HG J2a*-M410 was localised in Okkur and Nemam 
separated by 8 mutations on YSTR loci. No haplotype sharing was observed among 
them. The same was reflected in mismatch distribution with two distinct bimodal 
peaks with MPD of 6.77 (Fig 63c). This indicates two different sources of HG J2a* in 
these populations. HG J2b- M221/102 was characterised by Vairavan Kovil sub clans 


(Periyavakuppu and Thayanaravakuppu). The mismatch distribution showed a 


94 


Figure 63: Mismatch distributions of NRY HGs amog Nattukottai Chettiar 


a: NRY HG F*-M89 d: NRY HG J2b-M221/102 


MPD: 1.5 Maid MPD: 1.30 


come 
eau 
earn 
| 
| esa T 
saree 
ease = 


neweted inean patiweee Sfterumee 1.509 


Brereabe mean osrwee Gtere 


b: NRY HG H1a*-M82 e: NRY HG L1-M27/76 


amen MPD: 3.89 __ MPD: 5.44 


Unwunignted moan pairwise dittorence: 5.472 


Umeroighted mean pairwise difereace: 3.884 


ce: NRY HG J2a*-M410 f: NRY HG Rlala-M17 
MPD: 6.77 


i 
i 


MPDL: 1.74 


aweigtted mees pairwise dite 


PPE ETE: 


Uaweighted mess peirwise difference: 6.778 


unimodal peak with MPD of 1.30 indicating recent localized expansion within the 
Varivan clan (Fig 63d). 

HG L1-M27/76 was predominant in Perumathuradayar, Pillayarpatti, Surakudi 
and Erani Kovil formed a one step away cluster (Fig 64). Mathr Manalur formed 
another cluster with multistep mutations from the central median vector. Illupakudi 
formed another distant and distinct cluster in the network. The mismatch distributions 
showed bimodal peak with an MPD of 5.44 (Fig 63e) indicating recent multiple 
sources and demographic expansion haplotypes. HG Rlala-M17 central node was 
mainly represented by Elayathankudi Kalanivasal. Erani Kovil was separated from 
this cluster with one step distance. The mismatch distribution showed unimodal peak 
with MPD of 1.74 (Fig 63f), indicating the single source of haplotypes and expansion 
among these populations. 

4.6.4 K mean clustering analysis 

An attempt was also made to assign the individuals to a hypothetical ancestral 
population by applying Bayesian approach using both YSNP and YSTR information. 
Iterations were run for K 1-9 (number of populations) using ‘STRUCTURE’ software 
(Fig 65). The clans were segregated to its best at K =8. The sub-clans were structured 
based on their prevalent HG composition and unique YSTR evolution within each 
sub-clan, thus implying long term isolation and expansion of these clans. 


4.6.5 BATWING analysis 


To explore the time depths of sub-clan differentiation from each other, 
BATWING analysis was employed (Fig 66). The phylogenetic tree obtained revealed 
that Erani Kovil and Pillayarpatti had a recent split 462 Ybp. This was consistent with 
the fact that these two clans are referred as ‘brotherly clans’ and theirs do intermarry 


among them. On the other hand Nemam and Elayathankkudi Okkur were outliers in 


95 


‘JoIsN[o o1floads uvjo ve pouoy JeAepesnyeUINAag *,SUL[D A]JOYIOIG, SP 0} PaLoJol ole [IAOY ues pue mediedeyyig 
‘sodAjojdey poseys pomoys [IAOY luerq pue nediedAelid oY} SVaIOYM JO}SNId JOUNSIP & SUIOJ AN;eURYAY ANYIIAY :3ION 


ipnyueyeAie|3 [i 
[Ay 1ueJg O 
NequeseAeiig O 
ipnyesns fj 
ipnyedniy fi 
UBABJIEA, oO 
nddnyeaeAag Bo 
Aweseued [ij 
JeAepninyyniewnsag f 
JeAepninyyiuyuy fi 
snjeuey fj 


IvIIIYO IejoyNnyeN Suoue 9//LZ7IA-TT OH AUN JO YAOMJou IJVoIuUeS0]AYd ULIPeU psdNpey :79 aNSIy 


tK-2,3 and4 


ing a 


K mean clusteri 


° 
. 


Figure 65 


t K- 5,6,7 


ing a 


K mean clusteri 


Figure 65 


K mean clustering at k- 8 and 9 


. 
. 


Figure 65 


Note: The sub-clans were segregated to its best at K-8 with the YHGs and YSTRs 


unique to each sub-clan, indicating long term isolation and expansion 


MAIO WpaxzUewpedreyy 


wreulay ureutayy 


mMyyequmry MYyeyA] 


Inpeuryy Mey 


Aureseumed Iphyueywyedrepy 


Testauryeyy tphyueypedreyy 


nddnyeardtiag weaenre, 


pnyedngqy wpa yxedny] 


Toy er 


SPRL 


csr 


SESE 


Siz PII 


LETZ FSS 


LT9 


sT9 


96F 


99201 


99201 


PROZT 


66PS 


£667 


ABIQIOYD [eyOyNVeN JO suBI-qns 10J 99.13 IH9UISO]AYd poseq ONIAM.LVA °99 21NSIy 


coor 


FI6T 


0:000+-———___ 


the tree having a split time of 10,766Ybp. This cluster diverged them the rest 
4,096Ybp. Arumbakur having only F* showed a split time of 12,944 Ybp with the 
rest of the branches of clan. Overall all the clans showed a coalescence time of 14,858 


Ybp. It is an enigma how reach high fidelity of NRY HG to a clan came into vogue. 


96 


DISCUSSION 


5. DISCUSSION 


The present study on the India, covering four states, viz. Gujarat — the Kutch 
peninsula and the coastal rout to India, Maharashtra and Karnataka — the upper 
Wesetern Ghat ranges along with the rain shadow hill ranges and plains, lastly 
Andhra Pradesh — the Eastern part of the Deccan, known for it Godavari, Krishna 
river fed agriculture, along with earlier studies from this laboratory (Wells et al., 
2001; Kavitha, 2008; ArunKumar et al., 2012;ArunKumar, 2012) has given some 
definite clue on the peopling of Deccan. This might be much before the so called 
dispersal of the ‘Vedic’ people from Indo-Gangetic Doab. The earlier studies revealed 
that the structured society pre-exited before the introduction of Varna system in Tamil 
Nadu. This might be true even in other parts of the Decccan. I present my 
observations and arguments in 5 different chapters of discussion. 

5.1 The parameters of isolations in various study regions differ 

In an attempt to decipher the people of the four states that might be vital for telling 
the genetic history of whole of India, the first question posed was whether the factors 
that influenced the social structure as castes and tribes correlated to the NRY, i.e, 
male migration were the same in the four study states. The geophysical properties of 
the four states are very different. Gujarat is a monsoon dependent, mostly dry belt that 
was the gateway to the first and probably subsequent migrations of Man. The 
Maharashtra, is the starting point of Western Ghats and has many historical entry 
landmarks. The human habitation in this terrain and in Karnataka was essentially 
supported by north-west monsoon. The Western Ghats extending from south 
Maharashtra to Cape of India and particularly the rain forests of Kerala are one of the 
biodiversity hotspots of plants and animals. Even today, nomadic hunter gatherers like 


some Paliyans live isolated in Aliyar-Parambikulam areas of Western Ghats. The 


97 


Andhra Pradesh is unique in terms of its perennial river fed rice cultivar populations. 
Any initial population settlements must have taken place in an environment more 
suitable for human habitation, food procurement and survival. Thus the role of 
geography, subsistence, social features and languages were evaluated in shaping the 
NRY gene pool of the study region by employing particularly AMOVA. The 
parameters of stratifying a state (province) population were not the same for all. The 
observations were 

1. The degree of genetic variation between caste and tribe was high among 
Gujarat populations: Further Northern Gujarat were more comprised of caste 
populations while the Southern ones by tribal populations. 

2. Language and geographical barriers determine the NRY composition of the 
Maharashtra populations studied. 

3. Karnataka study populations, mostly from Southern hilly districts showed no 
one to one correlation with language, geography, caste-tribe divide, 
subsistence or other social characteristics. 

4. The populations of Northern Andhra Pradesh studied, were structured based 
on their mode of subsistence. The populations of Godavari and coastal belt- 
fishermen, farming community, warriors, Brahmins etc., are well structured 


and live in sympatric isolation. 


Large number of sociologists, anthropologists and geneticists dwelled in 
deciphering whether castes and tribes were derived from a common ancestor or not. 
Kivisild et al., (2003) suggested that castes and tribal population of India may have 
common origins while other study (Cordaux, 2004) suggested different origins. The 
present study on Gujarat populations found this to be true. All the study populations 


were IE speakers (includes Sindi, Kutchi or Gujarati dialects) and hence language 


98 


barrier was not the deciding factor. Geographically, majority of the caste populations 
were present in the Kutch belt of Gujarat with high proportion of Rlala-M17. 
Whereas both caste and tribal populations were present in Sourashtran region. 
Majority of tribal populations were localised in the Narmada valley and possessed 
more of H clades (Hla*-M82, H2-Apt) (Table 3, Fig 9). There was a continuum and 
the north Maharashtra tribes possessed appreciable H clades. The geography though 
fit well, the caste tribe divide gave a maximum AMOVA (Table 4) here. The castes 
and tribes showed a coalescent time of ~7.4Kya. When one travels down below the 
Narmada Valley, there lies Satpura and Gondwana hills of Maharashtra. The Central 
Dravidian speaking tribes of Gondwana (Gonds and Raj Gonds) showed over- 
whelming frequencies of NRY HG Hla*-M82 (~70%). Korku, the Western most, AA 
speaking tribes living in Satpura hills and amidst other CDR language speaking tribal 
groups in this region, they have maintained their language identity and also possessed 
appreciable frequencies of the NRY HG O2a-M95, a marker for AA language 
speakers (Reddy et al., 2007). The IE speakers localised in the Sahyadri region 
showed relatively lower HG Hla*-M82 and were distinct from Gonds and Korku 
linguistically, geographically and in NRY HG composition. Hence geography and 
language seem to have co-influenced in isolating the populations of Maharashtra. 
Studies by Thangaraj et al., (2010) have reported the geographical barriers as an 
important factor in shaping the NRY profile of the Maharashtra populations, but the 
mtDNA did not contribute to this distinction. 

The Sahyadri ranges (otherwise west side of Western Ghats) of Maharashtra 
extending to Karnataka were characterised by NRY HG Rlala-M17, J2a*-M410 and 
H clades. Majority of the populations of Western Ghats and coastal regions of 


Karnataka also showed HGs Rlala, Hla*, R2 and J2a. But as we proceed towards the 


29 


interior and Eastern slopes of Western Ghats, Kannada speaking, food foraging 
populations, such as Yerava and Jenukuruba were characterised by high HG F* 
(23.1% and 21.9% respectively). The majority of tribal populations of Tamil Nadu 
were also localised in these Eastern slopes of Western Ghats (Nilgiri hills). These 
food foraging populations also showed very high frequencies of HG F*(53.25%) 
(ArunKumar et al., 2012). The Kananda speaking tribes of Nilgiri hills showed 
23.62% of HG F*, a frequency similar to that of Yerava and Jenukuruba of 
Karnataka. The NRY HG F* is seen sporadically in other parts of India (unpublished) 
and none in other parts of the world. Consequently with the highest YSTR variance 
and ASD age estimates (Table 37) of HG F* in Tamil Nadu (0.779, 32700 + 5,700) 
followed by Karnataka (0.686, 27,500 + 4,700) indicated this Western Ghat region of 
Tamil Nadu and Karnataka to be the earliest settlement of these Dravidian speaking 
tribal populations in India (Kavitha, 2008; ArunKumar et al., 2012). 

It was hypothesised by Foote, (1876) that early human habituation dating back 
to Palaeolithic was not possible in Western Ghats due to heavy rainfall and thick 
vegetation. Alternatively it was also hypothesised that this rich vegetation zone must 
have attracted early humans because of ease of availability of resources but this lacks 
corresponding evidences (Chauhan, 2010). The present study and those from this 
laboratory, confirms that these Western Ghat were occupied as early as Upper 
Palaeolithic age. The presence of high frequencies of F* in Nadar populations of 
Tamil Nadu makes one to wonder whether upon agricultural expansion into 
previously non-cultivated areas, the tribal populations might have shun the 
newcomers and took shelter in more isolated prefereably forest and mountainous 


ranges,thus retaining their mode of subsistence and genetic distinctiveness until the 


100 


ee ee a 
ronro | <L_| 008 

a 

Pore, 2 

=a 


F 0089 | aoroe 
es 

fore fooror [sveo | s_| oor [oat 0 

fore Poort [ea [te foor's | oor'er [isco | or | ooe's | oosee [scso, PAUCITY 


LSEW-x¢'T OL/LTIN-TV'T COT/TTCIN-A7E OIVIN-+" CL << OH AUN 


i 


[618°0 

[989°0 

= == 
| 009%€ | o08'6t | 99¥0 | 
[$050 | 
[£950 | 


18°0 
89°0 
9r'0 
0S°0 
95°0 
A 


87 
L 
6£ 
0S evyeIeUIL | 
ev 
€T 
N 


idVv-7H 78IA-® TH TH 


Z eyysereyeyy 


10ZIN-D 6SIA- x 9SEIN-SO O€TIN-x9 << DH AUN | | 


$9}¥)s ApNjs SUOUIL SOF SNOLIVA 10J SoIVUIT)SA 9S¥ paseq FS -/-+ GSV PUL IdULLIVA WLSA 2LE IL 


A[SNONNLS PosOpISUOD 9q 0} PIOU Soy} IUD} “OZIS o[duIes MOT PUL SY_LS ISIOAIP 0} ONp 9q OsTe ULd SONTeA DOUeTIVA WLS YsIy 
‘Q00Z “'Te19 eIdNSuag Aq poqiissop poyjour SurAodure Aq poyepnoyed ud0q SARY SoJVUIT|SS ade poseg GSV UL 

Joly prepueys -dS € 

souslayjiq arenbs o8eloAVy :GSV ‘7 

190] L] JOAO 9OURTIVA YLSA JRA ‘| 


:3)0N 


Tose foot fovea or ore | oorsr [ose cer [sso s| 


ro 
| ooe'r | ooo't [seo] te | ooc'z foos’ci | ssvo | us | | | | 0c | ost foszo| te | __PSSHONI ¢ | 
| ooc'e | oor'ot [otro | 49 | 009% [ooc’rt | tceo | 96 | | | Ti 0e'c ‘| ose LES pd eve 
I TT 

LI 


rie'0 | 
oreo | 9 


N N N 


P7LI-7u LIA-B Te Te LOZI-* a S6IN-®7O f= 


present day. A similar hypothesis of taking shelter in the mountainous ranges was 
proposed by Sanghvi and Karve, (1981) as well. 

Effort was made to study the south Karnataka populations in the mountainous 
ranges adjascent to Nilgiris particularly Kutta and Mangalore surroundings. Northern 
Karnatana was purposefully omitted for many reasons such as, the influence of 
Vijayanagara Empire and the resultant later developments in population dynamics. 
Most of the populations except three of the tribal populations were very diverse with 
high variance but no admixture was detected for at-least for the past ~4.6Kya (Fig 
29). The study populations showed no correlation with language, geography or other 
social characteristics in the AMOVA analysis. And each population had their distinct 
genetic legacy and expansion. For example, the ages of agricultural based population 
(Adikarnataka, Kuruba, Gowda, Mogaveera and Kodava) obtained in the present 
study was ~4.6Kya and this correlated with the ‘ash mound tradition’ of Southern 
Neolithic age characterised by agro-pastoral activities (Boivin et al., 2008). The major 
crops cultivated during this period were native millets and pulses (Fuller, 2006). 
Genetic studies by Rajkumar and Kashyap, (2004) on four populations of Karnataka 
using 15 autosomal loci also did not reveal any linguistic or geography based 
clustering. They also inferred from their studies that the populations either had 
common ancestry or have experienced very high gene flow. 

The region of Kerala studied by Kavitha (2008) is a very interesting region of 
India, with lot of population movements and long standing maritime trade with 
Roman Empire. Kerala showed even frequencies and low Rst and Fst distance of 
different NRY HGs indicating significant genetic amalgamation, of various castes. 
The present study along with earlier ones from this laboratory also suggested the 


evergreen Western Ghats were the preferred settlement sites for many ancient tribes 


101 


and the later arriving stratified castes. Several studies have suggested a common 
genetic signature among distantly ranked-caste populations in South India (Shangvi et 
al., 1981; Watkins et al., 2008). The NRY studies on Tamil Nadu populations showed 
evidences for genetic structuring based on mode of subsistence and also suggested the 
existence of social stratification in Tamil Nadu prior to the establishment of Varna 
system (Arunkumar et al., 2012). This provided a classical example of societal 
formation. The genetic impact of Varna system on pre-existing populations was very 
minimal. A similar picture emerged from the present studies on Andhra Pradesh: the 
agricultural basin of Godavari belt revealed a more similar situation as that of Tamil 
Nadu. 

Previous studies based on autosomal markers, Y chromosome and mtDNA 
have attempted to classify the populations of Andhra Pradesh based on social (upper, 
middle and lower) and Varna (Brahmin, Kshatriya, Vyshya and Shudara) 
classification. Study by Bamshad et al., (1998) showed that mt DNA (N=250) based 
genetic distances correlated with social rank whereas Y chromosome genetic 
distances did not correlate with social rank. Their study concludes that the variation in 
Y chromosome is the result of mutation and drift. The movement of females across 
social rank as a result of hypergyny has resulted in social stratification. Whereas 
studies by Ramana et al., (2001) (N:204) and Cordaux, (2004) suggested heavy gene 
flow among the populations in Andhra Pradesh. Another study based on autosomal 
microsatellite loci (Reddy et al., 2005) with a large sample size (N:948) suggested the 
possibility of recent ancestors for the populations in Andhra Pradesh or extensive 
gene flow among these populations erased the original genetic differences. Their 
results suggested the lack of significant genetic differentiation based on social 


stratification. Most of the earlier studies were based on lower sample size and 


102 


covering a wider area. But the present study considering a definite Geography, 
Godavari belt and a larger sample size of 774, did not observe any such heavy gene 
flow for atleast ~2Kya in majority of the populations investigated. 

Drawing conclusions from the process of establishment of Varna system and 
its impact on genetic systems are contradictory. The establishment of this system has 
not been uniform in India (Champakalakshmi, 2001). The present study observes that 
tribal populations such as Konda Reddy and Konda Kammara have remained isolated 
for at least 8,238 and 5,990 years respectively from their neighbouring populations in 
the Eastern Ghats with minimal or no gene flow, predating the establishment of varna 
system (ie., 1.2K ya). Whereas the ages of Brahmin groups correlates with the arrival 
of IE speakers to South India (~3Kya). The ages of other farming population such as 
Kapu and Yadava correlates with the cultivation of pulses and rice in Eastern Ghats 
region between Godavari and Krishna rivers (Fuller, 2006) and animal domestication 
respectively. This age also marks the spread of rice in south India (Fuller, 2002). 
Similarly Gujarat was one of the earliest known regions for food producing 
complexes from Harappan region (Liversage, 1989). The early Harappan origins 
started in North Gujarat around 5000 years ago. This corresponds with the cultivation 
of native millets in north and Sourashtra (Fuller, 2006). The ages of study populations 
of Gujarat and Andhra Pradesh thus correlated to the period of millet and rice 
cultivation, that probably supported effective population expansion in this region and 
the dates obtained for these expansions in the present study reiterated the existence of 
agricultural societies in these regions well prior to the Varna system itself. 

It is imperative to discuss about the populations of Orissa and North Eastern 
region. Most of the populations in these two regions speak Austro Asiatic languages 


with minor proportions Tibeto Burmese and Central Dravidian. Studies from this 


103 


laboratory on a total of 2,558 samples from these two regions and their analyses have 
clearly shown that it was the AA language that developed along with the NRY HG 
O2a-M95. And presumably this clade and language originated in Lao region and 
expanded to other parts of India (Arunkumar, Lui Hui et a/ unpublished data). Thus 
the presence of O2a in Korku, an Austro Asiatic speaking tribe from Maharashtra is 
the Western most limits of AA speakers and in all probability, as suggeseted by 
network and other observations, might be the result of a back migration from Lao, 
Orissa and Central India regions. 

Though Lao is considered as the place of origin/ early settlement and 
successful expansions of O2a and AA language speakers, one cannot vouschafe from 
the available methodologies and approach, the exact location of mutation of O2a! 
Nonetheless, many studies based on autosomal SNPs, Y chromosome and mt DNA 
have suggested the origin of AA speakers from South East Asia (Choubey et al., 
2011). Studies on tribes of Madhya Pradesh on the contrary suggested that linguistic 
label doesn’t unequivocally follow the genetic imprints (Sharma et a/., 2012). Certain 
other studies infact proposed an in-situ origin of AA speakers in India and proposed a 
missing link between South and South-East Asian populations (Basu et al., 2003; 
Kumar et al., 2007; Reddy et al., 2007). The present observation on the extant of 
distribution from Maharashtra along with those observed earlier from this laboratory 
however presents a clear case of language NRY HG affiliation. 

The IE and DR languages though showed clear distribution and NRY 
associations, they were skewed by many populations that were either miscegenated or 
language replaced to varying degree. A classical example obtained in the present 
study was the Patel and Koli caste of Gujarat that cluster with tribes and showed 


many tribal cultural characteristics that was elaborated in respective section. 


104 


To conclude, on the contrary to several studies that included populations from 
diverse geographic locations of India and interpreting the genetic structure of Indian 
populations, or have studied populations from a specific geographical location and 
have extrapolated the observations to Indian subcontinent or north or south India, here 
an attempt was made to study the genetic structure of entire Deccan (Maharshtra, 
Karnataka, Kerala, Tamil Nadu, Andhra Pradesh and Orissa) along with those of 
Gujarat, the gateway to India (Table 38) by studying 59 well defined castes and 45 
tribes. The study reiterated our contention, that each region / state of India needs to be 
considered in the context of its socio-geography and cultural characteristics so that the 
real factor shaping the gene pool is understood. The attempt paid dividend by 
identifying the factors that are responsible for the sympatric isolations observed: 
while Gujarat and Maharashtra populations were structured based on caste tribe 
divide and/ or geography, the Karnataka populations showed no one to one correlation 
with any of the social, language or geography parameters, while the Andhra Pradesh 
demes were structured similar to those of Tamil Nadu populations and the structuring 
thus occurred well before the introduction of Varna system in to these regions and 
agricultural settlements as suggested by Fuller et al., 2006 played a major role in this 
structuring. While the AA speakers live in Orissa and northeast belt, the Dravidian 
speakers seems to have evolved in the Deccan. As revealed by the present study, the 
H clades seems to have characterized a much dispersed ancient populations and are 
characteristic of Deccan, Central India and Himachal regions, while L1 characterizes 
the south Dravidian speakers. Whether the central Dravidian speakers were ancient to 
the south Dravidian speakers or vice-versa may need to be further investigated. The 
scenario might have emerged due to different sequence of migrations, isolation and 


evolution of these language families and their gene poolin India. The PCA plot 


105 


Table 38: NRY based AMOVA of various sub groupings for Deccan Indian Populations 


| CT SNP CT STR 

No of 

Booe| we [ewe [re | me [mw [re 
.1320 | 0.2175 | 0.0985 | 0.1085 | 0.1658 | 0.0643 | 
2663 


Sub groupings 


0.2175 | 0.0985 | 0.1085 | 0.1658 | 0.0643 
| Fse | Fst | Fet_| Fse [| Fst | Fet | 


* AA speakers were eliminated in this amova analysis as they could artifically induce false positive results 


a ee ee ee ee 
0.0490 


Sahyadri, east of western ghats, south deccan, 

eee Eee 0.1043 | 0.1577 | 0.0597 | 0.0818 | 0.1268 

satpura, narmada valley, upper deccan (east 

Sahyadri, east of western ghats, south deccan 

plains, central deccan plains, eastern ghats, 

gondwana, satpura, narmada valley, west 10 | 0.1061 | 0.2028 | 0.1081 | 0.0852 | 0.1566 | 0.0780 
deccan, east deccan 


| CCSusbssistence CT CE CE CC CC 
Hunter, domestication, agriculture, artisan, 

si Sie ae tel Maa atiaiin 0.1225 | 0.1484 | 0.0296 | 0.0984 | 0.1184 | 0.0222 
warrior, brahmin (AA excluded) 


DR populations: 


mea ~ 9 0.0705 [0.1400 | 0.0747 | 0.0558 | 0.1178 | 0.0657, 


Fet 


Fct 
0 


IE caste tribe (Parsee removed) 
IE caste SC tribe (Parsee removed) 


*AA : Austro Asiatic 

* TE : Indo European 

* DR : Dravidian 

*CDR : Central Dravidian 

*SCDR : South Central Dravidian 

1. AA speakers were eliminated to avoid the false variation coused by these populations 

* Parsee and siddi were removed as they were migrant population from Africa and Iran respectively 


Based on subsistence: Br, agricultrue caste, 
vatiiot mabe Denies eibe areal ade 0.0858 | 0.1491 | 0.0693 | 0.0767 | 0.1189 | 0.0457 
agriculture, pastoral,SC (Parsee removed) 


depicting the above argument is shown in Fig 67. Hence the parameters that 
determine the structuring the people of Deccan are not uniform and one cannot 


consider the whole of India as a single entity, and with a given preconceived template. 


5.2: Various tribes and castes of a given region may have different origins: 


India is the second most populous country with 1.21 billion people (2011 
census). Tribe constitutes 8% of total Indian population. There are ~450 tribal 
communities in India (Singh, 1992) who speak ~750 dialects (Kosambi, 1991). They 
differ in their geographical distribution and display diversities in terms of 
demographic parameters such as habits, customs, beliefs, subsistence, language and 
ethnicity (Bhasin, 2006). It is also believed that tribes are possibly the original 
inhabitants of India. In recent years, several studies addressed the origin and antiquity 
of castes and tribes. Krithika et al., (2009) have proposed two paradigms for 
explaining the linguistic and ethnic affiliation of the tribal populations: 

A. The early migrants or settlers had a common origin and spoke a common 
language. In course of time, dispersing in to different geographical areas, 
acquired different languages due to cultural diffusion, separation or 
isolation. 

B. Alternatively, diverse endogamous groups speaking different languages 
settled over the same or a contiguous geographical expanse at different 
times and in due course of time their language may be overlaid by an 
adapted or acquired local language. But still, these diverse endogamous 


groups may retain their biological identity. 


A phenomenon of language shift was the common in these cases. For example 
Mushar- an AA speaking population from Uttar Pradesh (Chaubey et al., 2008) 


revealed a genetic affinity neighbouring AA speakers than to the IE speakers but had 


106 


Figure 67: PCA plot based on Fst distances among populations from various region of 
Deccan 


PC2 


Note: The populations of north and south Orissa are distinct from the other Deccan popu- 
lations indicating distinct migratory and evolutionary patterns among these populations as 
compared with other Deccan populaitons 


undergone a language shift to Hindi (IE family). The mtDNA and Y chromosomal 
analysis, thereby suggesting such linguistic shifts may not necessarily be a signal for a 
rapid genetic admixture, either maternally or paternally. 

Certain tribes have historically documentation of their migration into India. 
The classical example being Siddi (N=37). Non-indigenous markers such as HGs B- 
M60 (5.4%), BR-M139 (18.5%) and CR-M168 (54.1%) have been identified in this 
population and absent among all the other study population. These HGs have been 
identified in the Central Sahel in Africa (unpublished data). But the YSNP HGs and 
gene diversity of Siddis was calculated to be 0.6757 + 0.0736, similar to the other 
tribes within Gujarat (Results: section 4.1). This gives a clue for the assimilation of 
indigenous HGs in the place of settlement. Other genetic studies also reveal the 
presence of non African genetic markers from local Indian populations that Siddis 
may have assimilated (Ramana et al., 2001; Shah et al., 2011). 

The time of settlement of these Siddis were brought out by the ASD age 
estimates: HGs B-M160 and BR-M139 of Siddis from Gujarat were 3.7 + 1.2Kya and 
10.4 + 3.3Kya but the suggested time of entry of these populations was 15°19" 
century (Shah et al., 2011). The other HGs present in this population were not 
statistically significant. Shah et al., (2011) studying the autosomal and uniparental 
markers of Siddis (N=154) of Gujarat and Karnataka, has suggested that the Bantu 
speaking populations from sub Saharan Africa migrated toward the Indian 
subcontinent with the agriculture expansion from central western Africa. Overall 
these results both NRY and other genetic markers have comfirmed the non-Indian 
origin of Siddis. 

The present study on Korku, a Maharashtra tribe, the western most AA 


speaking population of India showed high frequency of HG O2a-M95, similar to the 


107 


other AA speaking populations of Orissa with an ASD age of 9.5 + 2.4 Kya. The 
YSNP gene diversity was low (0.4397 + 0.0983) but the Fst/Rst distances of Korku 
with its neighbouring populations was also high. All these indicate isolation of Korku 
among the hills of Satpura hills and distinct ancestry. 

Genetic similarities between two adjacent but two different language speakers 
were not uncommon in India, inspite of the endogamous nature of most of the 
populations: it might be due to fission and one population adopting a different 
language. Kolam speaking a CDR language, is distributed along the Satpura hills and 
showed equal proportions of HGs L1-M27/76 (a marker for Dravidian as suggested 
by Sengupta et al., (2006), HG O2a-M95 (a marker for AA speakers and R2-M124 
(unique to India and distributed in many warrior populations). . This was evident in 
the PCA and MDS plots (Results section 4.2) and the YSNP gene diversity of Kolam 
was also high 0.8917 + 0.0271. The Y-SNP data thus indicated a shared ancestry of 
Kolam. 

Surprisingly the HG H1la*-M82, an early successful expanded HG of most of 
India, was ten times less than the other CDR speakers such as Gonds of Maharashtra. 
One has to remember that the tribes need not essentially be either Dravidian or AA 
speakers only. All the tribes of Gujarat were IE speakers and they were distributed 
along the Narmada valley and Sahyadri region of Maharashtra. These tribes showed 
high proportions of HG Hla*-M82 (>20%) a marker for early settlers and H2-Apt 
(9%). 

The ethnographical details of Gujarat and Maharashtra populations suggested 
the practise of cross-cousin marriages by certain endogamous populations (such as 
Kathodia, Kotwalia and Maldhari in present study) was similar to that of Dravidian 


populations (Southworth, 2005; Trautmann, 1981). The Rlal which is the commonest 


108 


in IE speaking Brahmin related populations were absent or low in these tribes. Thus 
this might be clear case of language replacement of ancient populations. . Similarly, 
Koraga, a SDR tribe of Karnataka, inhabiting the regions of Udupi and Mangalore 
also showed 89% of HG Hla*-M82 with an ASD age estimate of 9.9 + 3.7 Kya. This 
implies that Dravidian cultural practises persisted in these areas and the language shift 
could not change their cultural characteristics — a unique phenomenon of tribals of the 
world and India. Most probable explanation for the language shift in Gujarat and 
Maharashtra tribes can be attributed to the incoming population being larger in size 
and the whole of surrounding speaks a better (IE) language so the local tribes could 
have adopted this new language. Further indepth study is warranted to understand 
how the language shifts occured inspite of retaining the cultural characeristics. 
Genetic dissimilarities between the tribes of the same language families can be 
explained in many ways: but most probable was genetically disparate tribes adopting 
a language in the new settlement place or influence by invasions. Jenukuruba and 
Yerava tribes speaking Kannada dialects (SDR language family) lives in east part of 
Western Ghats (Coorg) and showed statistically significant proportion of HG F*-M89 
(>20%), with an ASD estimates of 12.1 + 4.1 and 32.3 + 7.5Kya respectively. 
Similarly the HG F*-M89 among the telugu speaking tribe of Eastern Ghats (Konda 
Reddy) was seen in the frequency of 21.6% with an ASD of 16.4 3.3Kya. The 
presence of HG Hla*-M82 was also not significant in these tribes. Konda Reddy and 
Konda Kammara in contrast to those of Western Ghats tribes showed high frequencies 
of HG O2a-M95 (27% and 32.6% respectively) also. This HG was present in very 
minimal frequencies in other study tribes (except Korku-an AA tribe) as well as the 
Brahmin and farming related populations of Andhra Pradesh. The BATWING 


phylogenetic tree revealed long term isolation of these tribes from its neighbouring 


109 


populations in Godavari belt. HG O2a-M95 is one of the frequent haplogroup in the 
regions of Orissa that lie adjacent to Andhra Pradesh. The ASD estimates of HG O2a 
among these tribes were 14.6 + 2.2 and 24.2 + 7.2Kya respectively, which were 
similar to those observed in AA speaking tribes of Orissa (Table 37). Studies on 
genetic markers and anthropometric variables observations by earlier studies also 
showed the genetic similarities of Oraon and Mal Paharia, a non-Mundari speaking 
groups with other Mundari speakers. Therefore further study on the affinities of these 
tribes with that of Orissa populations is warranted. 

Sometime a caste with a distinct mode of subsistence such as agriculture may 
rank and cluster with tribal populations of the region. This was true with Patel, an 
agricultural based caste population clutering with Koli, a population that subsist on 
fishing, and other tribes, while all other Brahmin related castes cluster distinctly in 
BATWING tree.They (Patel and Koli) exhibited higher YSNP gene diversity (0.8341 
and 0.6212 respectively) in contrast to other caste populations in that region and also 
showed low frequencies of Rlala-M17 (statistically non-significant), higher 
frequencies of HG Hla*-M82 (26.8% and 31.6%) and higher Fst/Rst distances in 
comparison to other caste populations. The BATWING showed their clustering 
together. Thus the caste and its name christened by British ethnologists and 
antorhopologist might be based on their cultural characteristics, presumably a 
population fissioned drifted in different directions acquiring various cultural 
characteristics from the surrounding populations. This was not quite uncommon in the 
history of Mankind. This ascertain that the caste/ tribe divide may be a cultural 
evolution depending particularly on their mode of subsistence and the opportunity to 


diverge in their occupation and egging their living, available to them. Here 


110 


inmigration of culturally different population such as JE European language speaker 
seems to have played a major role. 
5.2.1: Not all the Brhmin populations had common ancestry: 

It is generally believed that ‘Brahmin’, is a ‘large’ word embracing a large 
entity of putative Central Asian migrants, or originating in Hind Kush ranges, settled 
during late Harappan phase and spreading to Indo Gangetic Doab and other parts of 
India. Study by Arunkumar (2012) has suggested an in-situ origin of Rlal in India, and 
the spread of IE languages to India as a result of a cultural diffusion than a genetic one. 
The present study comprising many Brahmin populations and those available in 
literature gave a chance to interpret these in terms of NRY chromosome and in the 
light of male mediated migration. 

The four states studied consisted many Brahmins populations: not all of them 
showed similar NRY HG composition. Brahmins of Gujarat showed high proportions 
of HG Rlala-M17 with an ASD age of ~5Kya and in the phylogenetic network (Fig 
14b) Brahmin related IE speaking populations clustered distinctly from other 
populations and tribal populations. Similar pattern of distinct clustering of IE 
speaking Brahmin related (including Parsee, Maratha) and tribal populations were 
observed in Maharashtra populations as well. 

Karve, (1961) work also indicated that each of the different Brahmin castes 
(Chitpavan, Sarasvat, etc.) in Maharashtra probably has a different origin. Parsees 
showing close affinity with Brahmin Desastha and Brahmin Chitpavan in NJ tree, but 
not in BATWING raises a question about their origin and the period of isolation and 
inbreeding they have experience. According to Qissa — I Sanjan, Parsees are thought 
to have migrated from Khorasan (ancient Parthia) to avoid persecution by Arabs. Mt 


DNA analysis showed very high frequency of haplogroup M among Parsee (55%), 


111 


similar to those of Indian populations and much higher than that of a combined 
Iranian sample (1.7%), highlighting the derivation of their maternal component form 
autochthonous Indian mtDNAs. McElreavey and Quintana-Murci, (2005) has reprted 
an admixture estimate of 100% from India. Qamar et al., (2002) suggested an Iranian 
origin based on their Y chromosome analysis (N=90). The present study showed that 
Parsees (N=86) possessed high proportion (statistically significant) of HG J2a*-M410 
(33.7%) with a higher ASD age of 30.4 + 7.6Kya as compared to all other study 
populations (Appendix 11). This HG has been suggested to have an exogenous origin 
(Senguupta et al., 2006). 

The story of Brahmin populations in Karnataka and Andhra Pradesh were 
quite diverse as evident in the PCA, MDS and BAWING phylogenetic trees in these 
states. The BATWING tree on NRY data on Havyaka Brahmins reveals that they had 
remained isolated for atleast 5-7 Kya, from other Brahmin populations of Karnataka 
such as Goud Saraswath and Iyengar and they cluster interspersed with populations of 
diverse mode of subsistence and cultural characteristics. 

The oral migratory history of Havyaka Brahmin states that 32 families 
migrated from Ahicchatra (present Uttar Pradesh) in to Banavasi region of Karnataka 
only during 345-360 AD during Kadamaba rule. Painted Grey Ware pottery were also 
first found at Ahicchatra in Bareilly district of Uttar Pradesh (Ghosh and Panigrahi, 
1946). Presumably they diverged and stayed isolated from other Brahmin groups way 
back in history before arriving to Karnataka. The oral history of Saraswath Brahmin 
that they migrated from Saraswathi river basin gave an age estimate of 4,604 years in 
the BATWING analysis. Similarly the Brahmin populations of Andhra Pradesh also 
were very diverse. Brahmin Dravida are marked by high proportions of HGs J2a*- 


M410 (24%) and G-M201 (27%), whereas the Brahmin ANV was marked by 


112 


significant proportion of HG Rlala-M17 (55%). It is interesting to note that these two 
Brahmins did not cluster with each other; rather they showed closeness with other 
agricultural populations of Andhra Pradesh. Thus it could be possible that the 
Brahmin populations either differentiated in India or had diverse origins. 

The present study thus proposed that different he caste and tribal populations 
of the study states in particular and the whole of India in general could have been 
derived from a common gene pool or different origins and migratory patterns: 
nonetheless one cannot categorically say that tribes were all different from castes The 
long term isolation and expansion have been observed both in tribal and caste 
populations studied. The language cannot be used as a defining criteria and proxy in 
these study states. The statement of Karve (1961) “it is not generally realised that the 
caste society in a sense was a very elastic society” has not been realized by many 
recent workers and they considered these two as watertight compartments. A caste 
bearing the same name may have very different origins in different geographical 
regions. There are examples in which a tribe dispersed over a large geographical 
region, took up different occupations in different sub-regions, and “fitted” itself into 
the caste hierarchy on different rungs. Similarly different caste may have different 
origin. Thus, the origin of caste populations may not be uniform over the entire Indian 
geographical space. 

5.3 The Distribution of L1 and the story of Dravidian: 
5.3.1: The Dravidian: 

The story of Dravidian is a great enigma that defied a definite answer for long. 
Dravidian people or peoples are terms used to refer to the diverse groups of people 
who natively speak languages belonging to the Dravidian language family. 


Populations of speakers of around 220 million are found mostly in Southern India. 


113 


Other Dravidian people are found in parts of central India, Sri Lanka, Bangladesh 
and Nepal. The Dravidian language family consists of 85 languages and are spoken by 
about 217 million people. The most populous Dravidian people are 
the Tamils, Telugus, Kannadigas, and the Malayalis. Smaller Dravidian communities 
with 1-5 million speakers are the Tuluvas, Gonds and Brahui (Krishnamurti, 2003) 

Dravidian languages are native to India and epigraphically the Dravidian 
languages have been attested since the 6th century BCE. Only two Dravidian 
languages are exclusively spoken outside India: Brahui and Dhangar, a dialect 
of Kurukh. Dravidian place-names — onomastics have been studied in the modern 
times and found cluster of Dravidian place names along the northwest coast 
of Maharashtra, Gujara, and to a lesser extent in Sindh including Indus valley 
settlements and Pakistan (Balakrishnan,1993). Dravidian grammatical influences such 
as clusivity are foundin Marathi, Gujarati, Marwari, and to a lesser extent 
Sindhi languages, suggesting that Dravidian languages must have been once spoken 
more widely across the India subcontinent. For this reason the present study included 
Maharashtra and Gujarat state as well in the attempt of identifying the genetic 
similarities of these people. 

While a number of earlier anthropologists held the view that the Dravidian 
people together were a distinct race, a number of recent genetic studies based on 
uniparental markers have challenged this view. Although in modern times majority of 
the Dravidian speakers occupy the Southern peninsula, Deccan, nothing is definite 
about their ancient distribution (Fig 68). However it is well established that the 
various Dravidian language speakers much have been wide spread throughout India 
ancient times, as supported by the presence of language isolates even today 


throughout India. The pattern of distribution of IE language speakers and their NRY 


114 


Figure 68. Map of the Dravidian and Munda languages. (From Trautmann 1981:10) 


Porji, Kondo, 
Godbo, Sovors. 
Pengo, Menda, 


chromosomal distributions (Arunmukar et al., 2012, Sharma et al., 2009) makes one 
to seriously think whether it was a very successful expansion of the IE speakers in 
Indo-Gangetic doab that pushed the Central Dravidian speakers to the forests of 
Madhya Pradesh and Orissa belt. 

There have also been many hypotheses on the origin of Dravidian language 
itself. Based on the Nostratic hypothesis, Dravidian language has been suggested to be 
akin with Proto-Elamite, which was spoken in the Fertile Crescent. It has been 
proposed that this language speakers spread eastwards towards Indus region along 
with farming technology (McAlpin, 1981; Cavalli-Sforza, 1996; Renfrew, 1996). The 
Neolithic settlement (~6,500 years) in Mehrgarh (south West Pakistan) showing a 
continuum of artefacts in their stratigraphy for about 4kya and the evidence of barley 
cultivation and agro-pastoralism in this region by this sedentary people is suggestive 
of Dravidian cultural element (Kochar, 2001,). Earlier studies have suggested the 
demic expansion as a cause of dispersal of many ancient populations and the dispersal 
of Dravidian towards eastwards is attributed to such dispersal (Renfrew, 1996). Such a 
dispersal is supported by mtDNA and Y chromosomal similarity of Brahui a North 
Dravidian language speakers in Pakistan, to that of Middle East (Krishnamurti, 2003; 
McElreavey and Quintana-Murci, 2005). It is possible that Dravidian linguistics were 
ancient ‘Lingua Franca’ of a wider area from Fertile Crescent to India (Pitchappan, 
2002). The present day existence of Dravidian languages however in India 
unequivocally atleast India for having nurtured this language. It was of interest for 
long to identify the people who were responsible for this language or at least the land 
which first given birth and nurtured this language. In this case, very much the 


language can also be equated to culture. 


115 


5.3.2. NRY HG L1 chromosomal evidences. 

With this background, now I interpret the results obtained in the present study 
in the light of those available in literature on NRY as well other genetic markers. The 
study by Qamar et al., (2002) identified HG L to be one of the common haplogroup in 
populations of Pakistan (14%) with the exceptions of Hazara and Kashmiris. The 
admixture analysis showed a non—Jewish origin of HG L with the caution being given 
on low sample sizes. Similar analysis also eliminated the possibility of non-Syrian 
origin of HG L among IE speakers- Baluch. Their study suggested a Neolithic origin 
of HG L that might be associated with the local expansion of farmers (TMRCA: 
~7,000 95% CI: 4,000-14,000) years. Cordaux, (2004) study on Indian samples 
suggested a package of HGs: J2, Rla, R2 and L being non Indian origin as they were 
present in higher frequencies in caste populations. 

The Y chromosome phylogenetic tree of HG L (2008) is given in Fig 69. 

HG LI, a subgroup of HG L has been associated with Dravidian languages 
(Sengupta et al., 2006). This was essentially based on microsatellite variance of HG 
L1 that was high in south India compared to Indus region. This study indicated the 
possibility of early diversification in Dravidian speakers and subsequent expansion 
towards peripheral regions, thus supporting indigenous origin of HG L1 during early 
Holocene (~9Kya). 

The origin and distribution of NRY HG L and its descendant is crucial for 
defining the Dravidian question. HG L*-M11 have been identified in Turkey 
(Cinnioglu et al., 2004). YSTR variance in Armeninas was found to be 0.41 (N=22) 
with age of 14.6 Kya and beta mean of 26.3Kya. These estimates overlap with that of 
Indian HG L1 ages (N=376). But the Armenians and Turkish populations do not show 


M27 mutation, a defining mutation of HG L1 that has been the characteristic of Indian 


116 


Figure. 69: Y chromosome phylogenetic tree of HG L with its defining SNP 
M11, M20, M22, M61, M185, M295 


: 

M27, M76 

M317 
. 
L2a| M349 
L2b) M274 


and Pakistan lineages. (Cinnioglu et al., 2004). Armenian YSTR lineages match with 
most of the Turkish counterparts. This provides the clue for the absence of HG L1- 
M27 in Armenians. The study also has eliminated the possibility of Syria (HG L: 
3.98%) or Pakistan (HG L: 9%) being the origin of HG L1. But this study was limited 
by smaller sample size. The north Dravidian speaker, Brahui showed HG L1 in only 
1/25 samples (Sengupta et al., 2006). Hence the query of origin of HG L1 being 
associated with Dravidian in south India demanded further exploration. 

In the present study, the north Indian populations (yellow cluster in Fig 40) 
showed small effective population size (351, 95% CI: 133-1,001) whereas, Deccan 
showed 2,888, 95% CI: 1,133-11,760. The TMRCA of North India was lower 
(~22Kya) than Deccan India (~74Kya) with marginal overlapping confidence 
intervals. But the population expansion times of 16,975 (95% CI:1,575-31,025) in 
Deccan and 15,725 (95% CI: 9,275-26,875) in North Indian regions (Table 22) have 
indicated an dispersal of HG L1 from Deccan to North Indian regions. Further within 
Deccan, HG L1 expanded in SDR speakers showing an average of 6.57% of HG LI, 
compared to Central Dravidian speakers with only 1.52% of HG Llie, four fold 
higher than CDR. 

The stratification of data further based on linguistic states, the age estimate 
and other statistical analysis narrowed down Karnataka or Tamil Nadu as the most 
probable region of expansion of HG L1. The expansion of HG L1 has been higher in 
Tamil Nadu (17Kya) whereas Karnataka showed (10.5Kya) though with overlapping 
confidence intervals. The YSTR variance and ASD estimates all were higher in Tamil 
Nadu. But the effective ancestral populations size (Na) of Tamil Nadu was lower than 
that of Karnataka: this suggested a smaller founder population but expanding to 


greater extant in Tamil Nadu populations and possible inmigrations from nearby 


117 


regions in Karnataka. Of the three subtypes of L tested the HG L3*-M357 were seen 
in minimal frequencies in Tamil Nadu in the present study whereas L2-M317 was 
absent. Further identification of sub haplogroups of L1 would resolve whether Tamil 
Nadu could be the origin or atleast the earliest settlement and successful expansion of 
Ll. 

To decipher the exact region of origin of HG several factors need to be taken 
into consideration such as haplogroup frequency, significant Fishers p value, 
accumulated YSTR diversity, high age, high Na and presence of the other HG sub 
groups within the same geographical area. As it stands in the data from the present 
study qualifies the Tamil Nadu with all the above said features as a candidate for the 
origin of L1 in India. 

Interestingly one of the samples studied from India showed an L*-M20, the 
parent of clade Ll. This L*-M20 have been identified in Lebanon and Arabian 
populations but with very low frequency (0.051 and 0.018) respectively (Zalloua et 
al., 2008). The other two derivatives of L*, viz. L2 and L3 are seen in the world, 
presumably not very successful in expansion as that of L1. East Caucasus populations 
possessed NRY HG L2-M317 (<3%) and 1% in Tajiks (Haber et al., 2012). HG L2 in 
India was present in the frequency of 0.08%. This HG is also referred as 
Mediterranean haplogroup. In contrast the L3 has been identified in the present study 
in many Northern Indian populations. Two successful interconnected 8 STR lineages 
emerged, one expanding comprising all Northern Indian samples studied and the other 
comprising all the samples of Southern Indian population, as well as samples from 
Pakistan, Afghanistan and East Caucasus. but these later samples appearing in the 


terminal branches of the network (Fig 46). 


118 


The TMRCA of HG LI dates to Holocene (27,524, 95% CI: 19,473-41,808) 
with and expansion being during early Neolithic in majority of the study regions. This 
is consensus with the other studies (Thangaraj et al., 2010). Neolithic ages in south 
India had been predominantly agro-pastoral. Therefore it could be probable that the 
spread of HG L1 in SDR speakers of south India could have been mediated by 
farming. 

5.4: The HG L3*-M357: Brokpa and their ‘Aryan’ claim: 

As per the NRY HG phylogeny, L clade — M20 is derived into three subclades, 
of which L1 is common to India (present study), whereas L2 is distributed Turkey and 
the most enigmatic and sparse of these, is L3. HG L3* is defined by M357 SNP 
mutation and is a sub haplogroup of HG L-M20 (Fig 69). 

Each one of these mutations has a distinct geographical affiliation and polarity 
of spread (Sengupta et al., 2006). HG L3*-M357 is present in Afghanistan and 
Pakistan (7.4% and 6.8% respectively) (Lacau et al., 2012 ; Abu-Amero et al., 2009), 
and a much lower frequencies (0.6% of each) were identified in SAR, Iran, UAR 
(Abu-Amero et al., 2009). Earlier study from this laboratory has suggested an external 
origin of HG L3*-M357, probably due to recent gene flow from western Eurasia 
(Arunkumar et al., 2012). Hence in the present study attempt was made to elucidate 
the migratory pattern of HG L3* in India. All the L3* samples obtained in The 
Genogrpahic project, in addition to my own investigations on the 4 states explored, 
were thus considered, along with those in literatures to deconstruct the question of 
origin and dispersal of HG L3*-M357. 

In the present Genographic study L3* samples have been identified 
sporadically in various regions of India (Fig 44a). The pattern of distribution and also 


various statistical analyses suggested two distinct migration pathways for Hg L3*- 


119 


M357. First being the movement of people from Afghanistan to Jammu region via 
Pakistan, presumably mediated by the Dardic (IE language family) speakers. The 
Brokpa (Buddhist and Islam) population subsisting on pastoralist activities in Dha 
Hanu region of Jammu shows a higher frequency of L3*-M357 (69% and 38% 
respectively). The oral migratory history of Brokpa claims that their ancestors moved 
to Dha Hanu villages from Gilgit region that borders Pakistan and Afghanistan. The 
dating of this movement based on HG L3* based on BATWING phylogenetic tree 
shows a time depth of ~18Ky and since then of the populations of Jammu remained 
isolated from the rest of the neighbours. The L3*-M357 of 6 populations of Himachal 
Pradesh, 3 of Punjab and 10 of Rajasthan were separated from these Jammu 
populations, as deciphered in the network and BATWING tree of the present study 
(Fig 46, 48) for atleast 14Kya. The network clearly showed the dispersal of this 
branch of L3* STRs, along with the IE speaking populations of north and Western 
Indian states studied under The Genographic (unpublished). 

Here we need to essentially answer the claim of Brokpa as ‘Pure Aryan’ that 
has attracted the attention of the world and many European visitors. This raises a 
question on the identity of ‘Aryan’ themselves: whether it was the putative 
hegemonic Central Asian gene pool and spread or there was no such thing as an 
‘Aryan’ race or gene pool as profounded of late by many historians of India 
(Thappar,1990). The predominant presence of L3* in Brokpa and its sporadic 
presence in Pakistan, but reasonable and widespread presence in isolated populations 
of Northern and Southern, both IE speaking and DR speaking (see below) suggest an 
early dispersal, though unsuccessful expansion of this clade in India. Essentially this 
argument negates the hegemonic claim of Brokpa as such. I hasten to add in the 


present day consensus that there was no Aryan race and no single invasion of central 


120 


Asian population; the myth of ‘Aryan invasion’ is in consonance with the current 
findings of the present study. 

Interestingly, this RM network showed the node radiating reticulation was 
comprised of only Dravidian speaking populations of the four states of the Deccan 
(Kamataka, Kerala, Tamil Nadu and Andhra Pradesh) and Orissa. This might be the 
second migration of L3* populations into Deccan relatively recent times ie. ~7Kya (as 
shown by BATWING) which marks the beginnings of the settled life the banks of 
Indus river in Mergarh (Gupta, 2004). A very recent split between Afghanistan and 
Andhra Pradesh (595 years ago) is in consensus with the historical events such as 
decline of Kakatiya dynasty and emergence of military powers, presumably the 
invaders bringing in this HG into Andhra Pradesh straight from Afghanistan. 
(Mohyuddin et al., 2006b) Identified new SNP mutation (PK3) specific to Kalash 
population who resides in the remote mountains of Hindu Kush ranges in North 
Pakistan. This population clustered with the Yadavas of South India.Yadavas of 
Andhra Pradesh showed a frequency of 3% for HG L3*-M357. Considering the 
Yadavas, one need to be reminded the Neolithic cattle keepers of Deccan, and the 
story of Lord Krishna and his Yadhu tribe that presumably ruled a vast expanse of 
India from Dwaraka. However Kalash stood distinctly in the MDS plot from all other 
Dravidian and IE speaking populations studied, indicating its distinctiveness. 

The present study revealed an ASD age estimate and YSTR variance of HG 
L3*-M357 that were higher in Afghanistan (15,200 + 4,400 and 0.31 respectively) as 
compared to other study populations. Afghanistan has been one of the important 
crossroads for human migrations, and an important stop along the Silk Road ancient 
days (Lacau et al., 2012). It was also one of the earliest known regions for 


domestication of wheat/barley, sheep, goat and cattle during Neolithic age. Further, 


121 


earlier NRY studies have suggested gene flow of HGs such as L-M20, H-M69 and 
R2-M124 between Afghanistan and India and these were mediated by IE (Dari or 
Pashto) populations such as Pashtuns or Pathans and Tajiks (Lacau et al., 2012). 

The global network of L3*-M357, using 9 STRs of the present study and those 
available in literature revealed further interesting points. Once again, the Indo 
European language speakers of India clustered distinctly with few East Caucus 
samples in the distant peripheries, while most of the Caucus-Chechen, Pakistan and 
Afghanistan samples shared haplotypes and clustered with many Dravidian speaking 
populations of Southern Indian states studied. The scenario suggested an evolution of 
L and L3*s in a common expansion of these regions from Caucus to Pakistan, but the 
composition of the median HTs and presence of Chechen of Caucus samples 
suggested later arrival to this from India/ Pakistan regions. 

Pathans, living in the south of Hindu Kush Mountains contribute to nearly 
42% of the total population of Afghanistan and 15% in Pakistan (Lacau et al., 2012). 
Afghanistan showed a total frequency of 9% of L3*-M357. They have been attributed 
a Jewish history by Ahmad (1952) and also Greek or Rajput ancestry by Bellew 
(1979); Caroe (1958). 

Haber et al., (2012) suggested based on MDS and barrier analysis that genetic 
affinity and gene flow between Afghanistan, north and west India were due to the 
interactions that could have existed since the establishment of the region’s first 
civilization at the Indus Valley and the Bactria-Margiana Archaeological Complex. 
This Afghan-Indian population structure excluded Hazaras, Uzbeks and South Indian 
Dravidian speakers. But the present study with respect to HG L3*-M357 only 
suggested an alternative hypothesis. The founder of L3*-M357 and may be Ls in 


Afghanistan or Pakistan, west of Indus barrier. Two routes of entry into India at two 


122 


different time periods, one to the north, an ancient one through passes in Hindkush 
ranges and another to south, presumably through coastal route, much later. 

Thus in the light of the fact that the two nucleus of HG L3* showing different 
directions of evolution of YSTRs it can be proposed that the peopling of Deccan and 
Northern India were not uniform and population movements from north to south or 
vice-versa were also scarce during ancient times. Higher resolution studies on the 
present cohort employing newer markers and whole genome scans will throw further 
light on the conclusions made based on the data set available here. We may also need 
higher coverage of samples from Pakistan and Afghanistan regions to decipher much 
accurate pattern of migration and peopling of India and the Deccan through Ls and 
other clades. 

5.5 The story of NRY HG Js as a marker for agricultural expansion: 

Indian subcontinent has a long history of agriculture. Wheat, barley, 
and jujube were domesticated as early as 9000 BCE; Domestication of sheep and goat 
soon followed (Gupta, 2004) and continued in Mehrgarh culture by 8000-6000 BCE 
(Baber, 1996; Harris et al., 1996). By 5th millennium BCE agricultural communities 
became widespread in Kashmir and Cotton was cultivated by the Sth millennium 
BCE-4th millennium BCE. Archaeological evidence indicates that rice was a part of 
the Indian diet by 8000 BCE (Nine et al., 2005) The Encyclopedia Britannica 
indicates that a number of cultures have evidence of early rice cultivation, including 
China, India, and Southeast Asia. All the more the irrigation technology was 
developed in the Indus Valley Civilization by around 4500 BCE (Rodda and Ubertini, 
2004). Archeological evidence has revealed animal-drawn plough dating back to 


2500 BCE in the Indus Valley Civilization (Lal, 2001). 


123 


Cavalli-Srorza et al., (1996) on the other hand has proposed that agriculture 
developed in Fertile Crescent about 15,000-10,000 Ybp, extending from Israel 
through Northern Syria to Western Iran. The mechanism of expansion was demic 
eastward ( human migration) (Cavalli-Sforza et al., 1996; Renfrew, 1988). It is 
hypothesized that the proto-Dravidian akin to proto-Elamite first spoken in Fertile 
Crescent were thus carried with demic spread and entered into India and subsequent 
migrations of Indo-European language speakers, the pastoral nomads from Central 
Asian steppes presumably replaced the Dravidian language speakers in the Western 
India. Anatolian theory on the other hand claims that IE languages spread from 
Anatolia (present day Turkey) with agriculture ~8000-9500 years ago (Gray and 
Atkinson, 2003; Bouckaert et al., 2012). What were the genetic markes of these 
people and their mode of dispersal are thus still debatable. The Central Asian pastoral, 
mounting the horse, discovery of wheels and Agriculture and its expansion are 
considered as important milestones in the dispersal of our species. 

The clinal patterns of haploid genome NRY, origin of agriculture and demic 
expansion in Europe have been explored in many studies (Cavalli 1994, (Rosser et al., 
2000). The J clade is present in appreciable frequencies in Europe, Anatolia, Middle 
East, Indus valley, souther India and Algeria in Africa (Hammer et al., 2000). The 
highest frequency of HG J is found many populations of Middle East, Iran and 
Algeria. The Caucasus—Anatolia and European populations have moderate 
frequencies (Quinta Murci et al., 2001). The age of this haplogroup is 14,800+9,700 
YBP (Hammer et al., 2000) while in southwestern Iran the age was 5,500—17,400 ybp 
(Quintana-Murci et al., 2001): suggestive of dispersal of this clade with agricultural 
expansion eastward. This age estimate is similar with the ages calculated for the North 


Indian populations (5,200:95% CI: 3,000-9,500) (Mukherjee et al., 2001;Quintana- 


124 


Murci et al., (2001) has thus suggested that this haplogroup may have been brought 
into India by Indo-European speakers from the Middle East. Cordaux (2004) on the 
otherhand has suggested a Central Asian origin, rather than west Asia. The studies on 
India were based on smaller sample size (89 J chromosomes from 4 north Indian 
populations by Mukarjee et al., (2001), 155 samples from 9 tribes and 1 caste in 
Cordaux, (2004), 7 from Gujarat by Quintana Murci et al., (2001) have shown 
sporadic distribution of J clade and does not conform to any particular state or 
linguistic groups. Hence the question of their dispersal in India and was addressed in 
the present study. 

The present study thus investigating in depth the sutypes of J clade: viz J2- 
M172 in 66 castes and 32 tribes from all over India under The Genogrpahic. Thus 658 
J2a*-M410 chromosome and 532 J2b-M221/102 studied for J2-M172 (designated as 
J2-M172 as per the ISOGG phylogenetic nomenclature, 2008) (Fig 70) and its 
subtypes ic. NRY HG J2a-M410, J2b-M221/102, J2a4a-M47 and J2a4c-M68 
revealing their distribution in India warranted a new interpretation. 

The J2a showed a very interesting pattern of maximum diversity and 
frequency in Karnataka region. The 17 STR RM network showing no median 
samples, but most of the Southern Indian samples in close to the hypothetical median 
and the Northern Indian samples in the periphery of the network implies an expansion 
from south to north. The 8 STR global network including J2a of Middle East and 
other countries showed a median constituted mostly by Lebanese samples and the 
radiating branches consisting samples from many countries and the other half mostly 
northern Indian samples were reflected very clearly in MDS plot. This indicated two 
different evolution of this clade with an early northern Indian founder and another 


southern Indian founder. In NJ tree also showed the Northern Indian samples 


125 


Figure. 70: Y chromosome phylogenetic tree of HG J with its defining SNP 
122.1, M304, P209, S6, S34, S35 


1) M267 


M367, M368 


M369 


'J2al' (not currently in use by ISOGG) 
J2a2) M340 

P279 

DYS413<18, L26/S57, L27 


M47, M322 
M67/S51 


M289 (location under DYS445<7 uncertain) 
M318 

J2a4h2} M158 (location under L24 uncertain) 
M12, M102, M221, M314 


M137 


- 

M205 

M241 
M99 
M280 
M321 
P84 


DYS455<9 


irrespective of region and whether they are Brahmin or sikh or Rajput or Bill or 
Mourya or Himachalli etc., all clustered distinctly with much younger split, compared 
to the much ancient branch of International samples studied, strewn along Indian, 
Chinese, European samples. Further the UP Bhumihar and Mythili Brahmin were 
distinct from the rest. In the older branch, the Palestine samples interestingly were 
seen all over this branch, indicating the greatest diversity in them. This wider 
distribution implicates a rapid dispersal: a classical example was the clustering of 
Nattukottai chettiar samples with Lebanese and Palestine samples in 8 STR based 
network. 

Majumdar et al., (2001), proposed that the Brahmin populations (showing 
higher HG J-12f2a frequency; 23.5%) had genetic contact with Aryan-speaking 
groups. In India, Brahmins were the torchbearers and promoters of Aryan ritual 
(Karve, 1961). But Sengupta et al., (2006) suggested the predominance of HG J2 to 
be higher in Dravidian populations than Indo European, by considering the Brahmin 
populations of South India (Iyengar and Iyer) as DR speakers. The present study 
shows the predominance of HG J2a*-M410 among IE speakers (6.2%). Brahmins 
(20.6% of total HG J2a*-M410) of India were considered to be IE speakers in this 
study irrespective of their geographic or languages they presently adopt, whereas HG 
J2b-M221/102 did not show such characteristic language affiliation. (3.7% in IE, p 
value= 8.E-03 and 4.5% in SDR, p value= 2.E-07). Other IE populations such as 
Maldhari (58.3% but YSTR diversity was nill) and Parsee (33.7%) also showed 
appreciable proportions of HG J2a*. The interesting story of J2a-M410 was the 
clustering of various Brahmin populations of India with different Indian and world 
populations in the NJ tree, the much younger branch of Himachal, Punjab & 


Rajasthan populations showing distinct and later spread. The age of J2a ~ 20,000 ybp 


126 


fits well with the scenario the rapid spread without much sedentary lifestyle, but lower 
frequencies in most of the areas implying local amalgamations and sometime staying 
isolated. The presence of J2a*s in the Jammu, Himachal Pradesh, Punjab, Rajasthan 
belt is interesting, but no clue on their origin can be obtained. The more recent 
movement of people from Iran to India had been Parsee during 10 century AD as 
refugees to India (possessing 33.3% of HG J2a and 3.5% of HG J2b-M221/102) 
(Nanavutty, 1970) through western corridor into Gujarat and later migrated to 
Bombay province. 

Earlier studies compared the western Asian populations and Indians revealing 
that Indian populations showed low YSTR diversities within HG J (Quintana-Murci et 
al., 2001; Nebel et al., 2002). In the present study, Palestine populations showed 
highest YSTR variance and ASD (0.70 and 26,872 + 6,166). Whereas Lebanon and 
India (pooled) showed similar variance and ASD estimates (0.53 and ASD 
~20,500years) for HG J2a*-M410. However, the effective population sizes, TMRCA 
and population expansion times suggests that Middle East populations were more 
ancient and Indian HG J2a*-M410 is a subset of those YSTRs. Haplotype sharing was 
also observed between the Lebanon and Indian populations, thereby indicating the 
presence of same 8 STR haplotype in India. Hence suggesting exogenous origin of 
HG J2a*-M410 in India. 

In HG J2b the YSTR variances and ASD estimates were higher in Indian 
populations as compared to other global populations that are presently studied. But 
high microsatellite variation could also result of repeated gene flow. This can be 
observed in study states such as Maharashtra (for example) reports high variance but 
this was the result of diverse sources of YSTRs as reflected in their mismatch 


distribution. No such distinction of Northern Indian populations from the rest of the 


127 


samples in global Network and MDS plots were observed, but with 17 STR similar to 
J2a, the north western state populations were seen in the outer most layers of the RM 
network. The pooled ASD age for HG J2b-M221/102 among all Indian study states 
(17 YSTR) was found to be at higher limit of 19,241 +5,066 years. Sengupta et al., 
(2006) reported an age of ~13K ya for HG J2b2 and suggested the appearance of this 
HG in India before agriculture. Semino et al., (2004) reported a high frequency of HG 
J2b-M102 in southern Balkans and north-central Italy and suggested the population 
expansion from these regions, but Cinnioglu et al., (2004) suggested high STR 
variance in southwest Asia (0.33) contradicting the previous statement. 

Therefore, in conclusion the present observations on larger sample size and 
comparision with international data sets suggested an exogenous origin of HG J2a- 
M410 and J2b-M221/102, probably Middle East and spread to various regions of 
India differently, may be through different routes through coastal and Hindukush 
ranges. Also the extent of differentiation of HG J clades and their associated 
microsatellites has indicated the Middle East as its likely homeland. In this area, J- 
M172 and J-M267 are equally represented and show the highest degree of internal 
variation, indicating that it is most likely that these subclades also arose in the Middle 
East. The age estimates of HGs J2a-M410 and J2b-M221/102 suggest the appearance 
of these HGs during Mesolithic ages (~20Kya) which is prior to the beginning of 
agriculture. Also in India, two different evolution of YSTRs have been identified 
among the HG J2a*-M410 of north and south Indian populations. Sengupta et al., 
(2006) proposed an eastward expansion of J2a-M410 to Iraq, Iran, and Central Asia 
coincident with painted pottery and ceramic figurines, well documented in the 
Neolithic archeological record (Cauvin, 2000). Also earlier studies have indicated the 


movement of other material culture towards Indus. Hence the spread of HG J2 cannot 


128 


be correlated with agriculture only. Pottery, trade and colonization without substantial 
military intervention also drove wealth and technological and cultural development. 
Examples of ways that genetic migration was mediated might include the silk and 
spice roads, which connected China with the Middle East through to Europe, as well 
as to spice sources in India and Indonesia, and the Incense Road, which connected 
India through the southern Arabian Peninsula (Zalloua et al., 2008). The presence of 
J2s in Chinese popuations vouchsafe for this. 

Hence, no clue on a real agricultural expansion with J clade can be identified 
in the present study. This clade though older, must have thus been carried over by 
criss cross movements of people, with the advent of trade. In general later the HG in 
the phylogeny, more rapid was the expansion and the diversity, presumably 
depending on the success story of these populations empowered with every 
technological development. 

5.6 The story of Nattukottai Chettiars and fidelity of patriliny 

Various subclans (Kovils) of the Chettiars were quite distinct from each other 
and the same was reflected in the Y chromosome composition and structure analysis. 
The brotherly clans ‘Erani Kovil’ and ‘Pillayarpatti? showed a split from their 
common ancestor only 462 years ago. These clans possessed high L1 and showed 
isolated evolution of YSTRs within HG L1, as reflected in network. The other clans 
Mathur_ Uraiyur (HG H1la*), Surakudi (HG L1) and Illupakudi (HGs H1a*, L1 and 
Rlala) possessing mainly autochthones Indian lineages seems to have amalgamated 
around 2000 years ago. Whereas the other clans such as Mathur _Arumbakur (HG 
F*), Nemam (HG J2a and R2) and Elayathakudi Okkur (HG J2a) should higher 
coalescent time of 10,766 Ybp. The J2a* sources of Nemmam and Okkur were 


different as revealed by network and mismatch distribution. 


129 


Considering the oral history of Nattukottai Chettiars that they migrated from 
Chola country to the present territory, it is probable that the populations constituting 
Vairavan and Nemam Kovils with high fidelity of HG J2 clades have migrated from 
different direction to amalgamate with the HG L1-M27/76 which could have been the 
predominant pre existing group. It is known that L1 had a huge expansion in Deccan 
and Tamil Nadu harbours highest frequency and diversity of L1. In the L1 global 
network (Fig 40) the Nattukotai Chettiar showed a distinct evolution from the early 
population of Deccan (see L1 section). It stood separate from other dry land farming 
populations of Tamil Nadu (ArunKumar e¢ al., 2012) indicating their common origin 
well before the caste formation. 

The varna system was introduced into Tamil Nadu only during Pallava/Chola 
—post Sangam period, though a well stratified society and professional occupation had 
society existed as evidenced by various sangam literature (Sastry, 1975). The patriliny 
is known in many agricultural populations and is thought to have originated in 
Kazakhstan with the advent of settled agriculture and land holding the pattern of male 
inheritance came into vogue. The Dravidian society was indeed a female centric 
society was indeed and the male hegemony with patriarchal, patrilineal inheritance 
with land holding was introduced into Tamil society only during later Chola period. 
The Adeenam and Mutts as agricultural and spiritual centres were also introduced. 
The NC indeed were considered as the descendants of Kovalan and Kanagi, the main 
characters of Silapathikaram, of this great Sangam epic. The high standards of cultural 
evolution of NC with their materialistic view of the world or in a small territory made 
them as one of the most stringent followers of this patriliny and caste system. The 
multiple NRY HGs stringent to each clan speaks for the fidelity of their belief system. 


The adoption of culture of both Dravidian kinship and many Vedic rituals infact 


130 


support different waves of migration and amalgamation. Our recent studies showed 
that the societal stratification occurred 7000 Ybp and the various population groups 
did not admix during past 300 years. This indicated the occurrence of such unique, 
only one, NRY HG lineages in many of the clan studies; and thus two or three 
different and distinct groups with similar value systems should have migrated and 
amalgamated may be in quick succession. This is L1 affinity to Dravidian 
populations; J2a middle east populations, Rlala to vedic/Central Asian/North Indian 
populations and F* to ancient settler can further be unravelled by whole genome 
studies only. 

Thus the cast formation among Chettiars has been the result of multiple 
migrations and settlements. The genetic data was well supported by the oral migratory 


history of Nattukottai Chettiars. 


131 


CONCLUSION 


6. CONCLUSION 


The present study on the populations of Gujarat, Maharashtra, Karnataka and 


Andhra Pradesh along with the other studies on Tamil Nadu, Orissa from this 


laboratory, and those from literature have lead to the following conclusions and 


deciphering the factors that determined in peopling India. The important findings 


from this study are thus presented below: 


L. 


The genetic structure of various study states of Deccan and Gujarat are not 
uniform 

The parameters that determine the genetic structure were not the same in 
various study states and thus India cannot be considered as a single gene pool. 
The castes and tribes of Gujarat were distinct in their NRY genetic profile and 
the variation between them was high among Gujarat populations. 

The language and geography were the most important isolation parameters 
among Maharashtra study populations 

Most of the Karnataka study populations showed high NRY HG variance and 
thus no one to one correlation with either with language, geography, caste- 
tribe divide, subsistence or other social characteristics. 

The split ages of the study populations from Karnataka obtained from 
BATWING analysis indicate that the agricultural based populations was 
~4.5Kya which correlate with the ‘ash mound tradition’ of southern Neolithic 
culture and cattle keepers (Fuller, 2006). 

The study populations of Coastal and Godavari belt of Andhra Pradesh were 
structured based on their mode of subsistence, with no recent gene flow for 


atleast ~1.7Kya. 


132 


8. 


9: 


10. 


1, 


12. 


The ages of study populations of Gujarat and Andhra Pradesh correlated to the 
period of millet (~4.5Kya) and later rice cultivation (in Andhra Pradesh: 
~2Kya). The existence of agricultural societies in these regions fits well with 
the advent of the Varna system itself (1.2Kya) 

Not all the tribes had a shared ancestry. Different tribes among the study 
region showed different origins (as evidenced by Siddis, or Korku). Some 
showed a shared ancestry with other populations. 

NRY HG L1-M27/76 may further be qualified as a marker for South 
Dravidian populations, extending the study of Sengupta et al., (2006)) 

Tamil Nadu may thus be the candidate of origin or early settlement and 
successful expansion of HG L1-M27/76 in India, with a small effective 
population size and a maximum population expansion time (17Kya) and 
seeding to other regions of India. However the hypothesis of proto-Dravidian, 
demic diffusion, along with agriculture as suggested by Renfrew (1988) 
cannot be eliminated. The presence of Syrian, Afghanistan and Pakistan 
median haplotype and and their wide distribution (at 9 YSTR resolution) in 
Himachal Pradesh, the coastal Gujarat and Maharashtra and all over the 
Deccan, but not Uttarpradesh and to its east, may still suggest an their origin 
of this clade in the far west of India and a spread towards India. 

The HG L3*-M357 distribution patterns suggested two alternative migratory 
routes from the land of their origin in Afghanistan or Pakistan, or further 
West. Two routes of entry into India probably took place at two different time 
periods, one to the north, an ancient one through passes in Hindkush ranges 
~18Kya, settling in Himachal Pradesh and another to south, much later 
(~7Kya) presumably through coastal route, much later, the samples 
represented from Chechen-East Caucus- Dagestan, Afghanistan and Deccan 
(8str global network). 


133 


13. 


14. 


1D: 


16. 


Analysis of HG J2a-M410 and HG J2b-M221/102 suggests exogenous origin 
of these HGs in India, probably Middle East. These HGs probably spread into 
India in different routes. 

Diverse YSTR evolution of HG J2a*-M410 have been identified among the 
north and south Indian populations. Whereas such patterns was not observed 
in HG J2b-M221/102. 

The ASD ages of HGs J2a-M410 and J2b-M221/102 were calculated to be 
~20Kya and did not did not provide any clue for real agricultural expansion 
within J clade in the present study. The dispersal of these clades in India can 
be attributed to the pottery, military intervention or refugee also. 

The patrilineal clan system of Nattukottai Chettiars clearly showed the fidelity 
of the Varna system in them, the strict adoption of patriliny of inheritance of 
their ‘Kovil’ (Temple=clan), each clan having mostly one HG. Their oral 
tradition of rehabilitated from Kaveripoompatinum to the present expanse of 
‘Nattukottai’ might be true. Nonetheless there are signals of various HGs and 
their dating indicating that an incoming group with J clades (Vairavan Kovil 
and Nemam), the traders might have amalgamated with a preceding arrival, 
earlier settler having preponderant L1, quite distinct through from other L1s 
present in other populations of Tamil Nadu and the presence of Rlal only in 
Kalanivasal, Vaizhnava worshipper of Elayathakudi may be a later addition. 
The age calculations of these clades in NC show their caste formation during 
various periods J2, Ll, & Rlal. Presence of only F* in one clan and O2a in 
another clan are further indicators of amalgamation of local and distant 
populations at the time of caste formation or much later. The oral tradition of 


referring Erani and Pillayarpatti clans as “brotherly” and do not inter-marry, 


134 


ie 


corroborated well on various counts: occurring together in NJ tree, sharing L1 
haplotypes and their haplotypes and showing young split time of 462 years in 
BATWING analysis. 

The study brought out the diversity of Brahmin populations. Various Brahmin 
populations from different states studied, did not isolate themselves. In most 
of the analyses, they were seen mixed with other caste and tribe populations in 
networks and trees. Nonetheless, various Brahmin populations shared specific 
HTs with different Indian and global populations suggesting their affinity and 
origin with these people and their land: Most striking of these were, the 
median in J2a cluster — etc., 

Thus, the study population thereby reveal different histories and distinct 
genetic legacy of various populations and states. The evolutionary factors and 
genetic phenomena operating on them were not the same. The structure of the 
populations was laid quite earlier as demes in various regions much earlier 
they were identified by different names and the advent of Varna. Further 
analysis with increased sample size from global populations with higher 
resolution is warranted. Mt DNA and whole genome analysis would further 


throw deeper insights into population histories. 


135 


REFERENCE 


6. REFERENCES 


Abu-Amero, K., Hellani, A., Gonzalez, A., Larruga, J., Cabrera, V., Underhill, P., 
2009. Saudi Arabian Y-Chromosome diversity and its relationship with nearby 
regions. BMC Genetics 10, 59. 


Agrawal, J.C.A.A.S.P., Agrawal, S.P., Aggarwal, S.S.G., J. C., 1995. Uttarakhand: 
Past, Present, and Future. Concept Publishing Company. 


Agrawala R C, 1989. An Encyclopedia of Indian Archaeology, Ghosh A.(Ed.). ed. 
New Delhi (Munshiram Monoharlal). 


Ahmad AKN, 1952. Jesus in heaven on earth. 

Allelic discrimination.assay.ABI.online protocol, 2012. . 

Al-Zahery, N., Semino, O., Benuzzi, G., Magri, C., Passarino, G., Torroni, A., 
Santachiara-Benerecetti, A.S., 2003. Y-chromosome and mtDNA 


polymorphisms in Iraq, a crossroad of the early human dispersal and of post- 
Neolithic migrations. Molecular Phylogenetics and Evolution 28, 458-472. 


Ammerman, L.L.C.-S., 1984. The Neolithic Transition and the Genetics of 
Populations in Europe. Princeton University Press, Princeton. 


ArunKumar G, 2012. Studies on the NRY and mtDNA Genomic Diversity of selected 
Indian populations carrying the NRY haplogroups R and O: Deciphering the 
early settlement of Austro-Asiatic and Indo-European speakers of India. 

ArunKumar, G., Soria-Hernanz, D.F., Kavitha, V.J., Arun, V.S., Syama, A., Ashokan, 
K.S., Gandhirajan, K.T., Vijayakumar, K., Narayanan, M., Jayalakshmi, M., 
Ziegle, J.S., Royyuru, A.K., Parida, L., Wells, R.S., Renfrew, C., Schurr, T.G., 
Smith, C.T., Platt, D.E., Pitchappan, R., The Genographic Consortium, 2012. 
Population Differentiation of Southern Indian Male Lineages Correlates with 
Agricultural Expansions Predating the Caste System. PLoS ONE 7, e50269. 


Asko Parpola, 2005. Study of the Indus Script. 


Austroasiatic languages -- Britannica Online Encyclopedia [WWW Document], 2012. 
. Encyclopedia Britannica. URL 


http://www.britannica.com/EBchecked/topic/4454 1/Austroasiatic-languages 


Ausubel, F.M., Brent, R., Kingston, R.E., Moore, D.D., Seidman, J.G., Smith, J.A., 
Struhl, K. (Eds.), 2002. Short Protocols in Molecular Biology,5th Edition, 5th 
ed. Current Protocols. 


136 


Baber, Zaheer (1996). The Science of Empire: Scientific Knowledge, Civilization, and 
Colonial Rule in India. State University of New York Press. 19. ISBN 0-7914- 
2919-9. 


Balasubramanian D, Appaji Rao N, 1998. The Indian human heritage. University 
Press, Hyderabad. 


Bamshad, M., 2001. Genetic Evidence on the Origins of Indian Caste Populations. 
Genome Research 11, 994-1004. 


Bamshad, M.J., Watkins, W.S., Dixon, M.E., Jorde, L.B., Rao, B.B., Naidu, J.M., 
Prasad, B.V.R., Rasanayagam, A., Hammer, M.F., 1998. Female gene flow 
stratifies Hindu castes. Nature 395, 651-652. 


Bandelt, H.J., Forster, P., R6hl, A., 1999. Median-joining networks for inferring 
intraspecific phylogenies. Mol Biol Evol 16, 37-48. 


Basu, A., Mukherjee, N., Roy, S., Sengupta, S., Banerjee, S., Chakraborty, M., Dey, 
B., Roy, M., Roy, B., Bhattacharyya, N.P., Roychoudhury, S., Majumder, 
P.P., 2003. Ethnic India: A Genomic View, With Special Reference to 
Peopling and Structure. Genome Res. 13, 2277-2290. 


Bellew HW, 1979. The races of Afghanistan. 


Berniell-Lee, G., Calafell, F., Bosch, E., Heyer, E., Sica, L., Mouguiama-Daouda, P., 
Veen, L. van der, Hombert, J.-M., Quintana-Murci, L., Comas, D., 2009. 
Genetic and Demographic Implications of the Bantu Expansion: Insights from 
Human Paternal Lineages. Mol Biol Evol 26, 1581-1589. 


Bhasin MK, 2006. Genetics of Castes and Tribes of Indiall: Indian Population Milieu. 
Int J Hum Genet 6, 233-274. 


Bhasin MK, Nag Shampa, 1994. Incidence of consanguinity and its effects on 
fertility, mortality and morbidity in Indian Region:, Veena Bhasin (Ed.). ed, 
In: People, Health and Disease: The Indian Scenario. Kamla-Raj Enterprises, 
Delh. 


Boivin N, Fuller D, Korisettar R, Petraglia M, 2008. First Farmers in South India:The 
role of internal processes and external influences in the emergence and 


transformation of south India’s earliest settled societies. 
Bongard-Levin. G, 1979. A history of India. Progress Publishers, Moscow. 


Bouckaert, R., Lemey, P., Dunn, M., Greenhill, S.J., Alekseyenko, A.V., Drummond, 
A.J., Gray, R.D., Suchard, M.A., Atkinson, Q.D., 2012. Mapping the Origins 
and Expansion of the Indo-European Language Family. Science 337, 957-960. 


137 


Burgarella, C., Navascués, M., 2011. Mutation rate estimates for 110 Y-chromosome 
STRs combining population and father-son pair data. Eur. J. Hum. Genet. 19, 
70-75. 


Butler JM, 2001. Biology and technology and genetics of str markers, 2nd ed. 
Academic Press, London. 


Cann, R.L., Stoneking, M., Wilson, A.C., 1987. Mitochondrial DNA and human 
evolution. , Published online: 01 January 1987; | doi:10.1038/325031a0 325, 
31-36. 


Capelli, C., Wilson, J.F., Richards, M., Stumpf, M.P.H., Gratrix, F., Oppenheimer, S., 
Underhill, P., Pascali, V.L., Ko, T.-M., Goldstein, D.B., 2001. A 
Predominantly Indigenous Paternal Heritage for the Austronesian-Speaking 
Peoples of Insular Southeast Asia and Oceania. Am J Hum Genet 68, 
432-443. 


Caroe O, 1958. The Pathans. 


Cattell R, 1966. The scree test for the number of factors. Multiv. Behav. Res 1, 
2455-276. 


Cauvin J (2000) The birth of the gods and the origins of agriculture. Cambridge 
University Press, Cambridge, United Kingdom 

Cavalli-Sforza LL, 1996. The spread of agriculture and nomadic pastoralism: Insights 
from the genetics, linguistics and archaeology, In: Harris DR, editor. ed. 
Smithsonian Institution Press, Washington, DC. 


Chaix, R., Quintana-Murci, L., Hegay, T., Hammer, M.F., Mobasher, Z., Austerlitz, 
F., Heyer, E., 2007. From Social to Genetic Structures in Central Asia. Current 
Biology 17, 43-48. 


Charlesworth, B., 2003. The organization and evolution of the human Y chromosome. 
Genome Biol 4, 226. 


Chauhan P, 2010. The Evolution and History of Populations in South Asia: Inter- 
Disciplinary Studies in Archaeology, Biological Anthropology, Linguistics 
and Genetics.’, Review of ‘Petraglia, M.D. and B. Allchin (eds.). ed. 

Cinnioglu, C., King, R., Kivisild, T., Kalfoglu, E., Atasoy, S., Cavalleri, G.L., Lillie, 
A.S., Roseman, C.C., Lin, A.A., Prince, K., Oefner, P.J., Shen, P., Semino, O., 
Cavalli-Sforza, L.L., Underhill, P.A., 2004b. Excavating Y-chromosome 
haplotype strata in Anatolia. Human Genetics 114, 127-148. 


Consortium, T.Y.C., 2002. A Nomenclature System for the Tree of Human Y- 
Chromosomal Binary Haplogroups. Genome Res 12, 339-348. 


138 


Cordaux, R., 2004. Independent Origins of Indian Caste and Tribal Paternal Lineages. 
Current Biology 14, 231-235. 

Cordaux, R., Saha, N., Bentley, G.R., Aunger, R., Sirajuddin, S.M., Stoneking, M., 
2003. Mitochondrial DNA analysis reveals diverse histories of tribal 
populations from India. Eur. J. Hum. Genet. 11, 253-264. 

Costantini, L, 1987. Appendix B. Vegetal remains, In G. Stacul (Eds.). ed. Instituto 
Italiano per il Medio ed Estremo Orientale, Rome. 

Cruciani, F., Trombetta, B., Massaia, A., Destro-Bisol, G., Sellitto, D., Scozzari, R., 
2011. A Revised Root for the Human Y Chromosomal Phylogenetic Tree: The 
Origin of Patrilineal Diversity in Africa. The American Journal of Human 


Genetics 88, 814-818. 

de Carvalho, C.M.B., Santos, F.R., 2005. Human Y-chromosome variation and male 
dysfunction. J Mol Genet Med 1, 63-75. 

Deccan plateau, India -- Britannica Online Encyclopedia [WWW Document], 2012. . 
Encyclopedia Britannica. URL 
http://www. britannica.com/EBchecked/topic/154969/Deccan 

Deraniyagala, Siran U, 1996. Pre- and Protohistoric settlement in Sri Lanka, XIII U. I. 
S. P. P. Congress Proceedings- Forli. International Union of Prehistoric and 
Protohistoric Sciences. Archived. 

Dhavalikar, M.K., 1997. Indian protohistory. Books & Books. 

Diamond, J., Bellwood, P., 2003. Farmers and Their Languages: The First 
Expansions. Science 300, 597-603. 

Elfenbein, J. H., 1987. A Periplus of the “Brahui Problem”. Studia Iranica 16, 
215-233. 

Excoffier, L., Laval, G., Schneider, S., 2005. Arlequin ( version 3 . 0 ): An integrated 
software package for population genetics data analysis 47—50. 

Fluxus. Technology. Ltd, 2012. Phylogenetic Network Software. 

Foote R, 1876. The Geological Features of the South Mahratta Country and Adjacent 
Districts. Memoirs of the Geological Survey of India 12, 1-268. 

Forster, P., R6hl, A., Linnemann, P., Brinkmann, C., Zerjal, T., Tyler-Smith, C., 
Brinkmann, B., 2000a. A Short Tandem Repeat—-Based Phylogeny for the 
Human Y Chromosome. The American Journal of Human Genetics 67, 
182-196. 

Forster, P., R6hl, A., Linnemann, P., Brinkmann, C., Zerjal, T., Tyler-Smith, C., 
Brinkmann, B., 2000b. A Short Tandem Repeat—Based Phylogeny for the 


139 


Human Y Chromosome. Am J Hum Genet 67, 182-196. 


Fuller, D., 2007. Non-human genetics, agricultural origins and historical linguistics in 
South Asia, in: Petraglia, M.D., Allchin, B. (Eds.), The Evolution and History 
of Human Populations in South Asia, Vertebrate Paleobiology and 
Paleoanthropology. Springer Netherlands, pp. 393-443. 


Fuller, D.Q., 2006. Dung mounds and domesticators: early cultivation and pastoralism 
in Karnataka [WWW Document]. URL http://discovery.ucl.ac.uk/107968/ 


Fuller DQ, 2006. The Archaeobotany of Indian Pulses: Identification, Processing and 
Evidence for Cultivation [WWW Document]. URL 
http://www.academia.edu/322740/The_Archaeobotany_of Indian Pulses Ide 
ntification Processing _and_ Evidence _for Cultivation 


Galouchko Vladimir, 2012. 3D Field. 
Ghosh A, Panigrahi K C, 1946. Pottery of Ahichchatra (U.P). Ancient India 37—59. 


Ghosh, A.K., 1970. The Paleolithic Cultures of Singhbhum. Transactions of the 
American Philosophical Society 60, 3. 


Gray, R.D., Atkinson, Q.D., 2003. Language-tree divergence times support the 
Anatolian theory of Indo-European origin. Nature 426, 435-439. 


Guha, B.S., 1936. The Racial Affinities of the People of India. Imprimerie médicale et 
scientifique. 


Gupta, Anil K. in Origin of agriculture and domestication of plants and animals 
linked to early Holocene climate amelioration, Current Science, Vol. 87, 
No. 1, 10 July 2004 59. Indian Academy of Sciences. 


Gyaneshwer Chaubey, Mait Metspalu, Monika Karmin, Kumarasamy Thangaraj, Siiri 
Rootsi, Juri Parik, Anu Solnik, Deepa Selvi Rani, Vijay Kumar Singh, Prathap 
Naidu, B, Alla G. Reddy, Ene Metspalu, Lalji Singh, Toomas Kivisild1, 
Richard Villems, 2008. Language Shift by Indigenous Population: A Model 
Genetic Study in South Asia. Int J Hum Genet 8, 41-50. 


Haber, M., Platt, D.E., Ashrafian Bonab, M., Youhanna, S.C., Soria-Hernanz, D.F., 
Martinez-Cruz, B., Douaihy, B., Ghassibe-Sabbagh, M., Rafatpanah, H., 
Ghanbari, M., Whale, J., Balanovsky, O., Wells, R.S., Comas, D., Tyler- 
Smith, C., Zalloua, P.A., The Genographic Consortium, 2012. Afghanistan’s 
Ethnic Groups Share a Y-Chromosomal Heritage Structured by Historical 
Events. PLoS ONE 7, e34288. 


Hammer, M.F., Karafet, T.M., Redd, A.J., Jarjanazi, H., Santachiara-Benerecetti, S., 
Soodyall, H., Zegura, S.L., 2001. Hierarchical patterns of global human 


140 


Y-chromosome diversity. Mol. Biol. Evol. 18, 1189-1203. 


Harris, David R. and Gosden, C. (1996). The Origins and Spread of Agriculture and 
Pastoralism in Eurasia: Crops, Fields, Flocks And Herds. Routledge. 
p.385.ISBN 1-85728-538-7. 


Harpending, H.C., Batzer, M.A., Gurven, M., Jorde, L.B., Rogers, A.R., Sherry, S.T., 
1998. Genetic traces of ancient demography. Proc. Natl. Acad. Sci. U.S.A. 95, 
1961-1967. 


Harvey Lodish, Arnold Berk, S Lawrence Zipursky, Paul Matsudaira, David 
Baltimore, and James Darnell., 2000. Molecular Cell Biology, 4th ed. W. H. 
Freeman, New York. 


Helena Mangs, A., Morris, B.J., 2007. The Human Pseudoautosomal Region (PAR): 
Origin, Function and Future. Curr Genomics 8, 129-136. 

Henn, B.M., Cavalli-Sforza, L.L., Feldman, M.W., 2012. The great human expansion. 
PNAS 109, 17758-17764. 


Heyer, E., Puymirat, J., Dieltjes, P., Bakker, E., Knijff, P. de, 1997. Estimating Y 
Chromosome Specific Microsatellite Mutation Frequencies using Deep 
Rooting Pedigrees. Hum. Mol. Genet. 6, 799-803. 


Holtkemper, U., Rolf, B., Hohoff, C., Forster, P., Brinkmann, B., 2001. Mutation rates 
at two human Y-chromosomal microsatellite loci using small pool PCR 
techniques. Hum. Mol. Genet. 10, 629-633. 


Huson, D., Richter, D., Rausch, C., Dezulian, T., Franz, M., Rupp, R., 2007. 
Dendroscope: An interactive viewer for large phylogenetic trees. BMC 
Bioinformatics 8, 460. 


Jarrige J F, 1986. Excavations at Mehrgarh-Nausharo: Pakistan. Archaeol 10, 63-161. 


Jarrige J F, 1990. Excavations at Nausharo 1988-1989, Pakistan. Archaeol 25, 
193-240. 


Jobling, M.A., Tyler-Smith, C., 2000. New uses for new haplotypes the human Y 
chromosome, disease and selection. Trends Genet. 16, 356-362. 


Jobling, M.A., Tyler-Smith, C., 2003. The human Y chromosome: an evolutionary 
marker comes of age. Nature Reviews Genetics 4, 598-612. 


Jollifee I, 1986. Principal Coponents Analysis. New York, NY, Springer. 


Kajale, M. D, 1991. Current status of Indian palaeoethnobotany: Introduced and 
indigenous food plants with a discussion of the historical and evolutionary 


141 


development of Indian agriculture and agricultural systems in general, In J. M. 
Renfrew(Ed.). ed. Edinburgh University Press, Edinburgh. 


Karafet, T., Xu, L., Du, R., Wang, W., Feng, S., Wells, R.S., Redd, A.J., Zegura, S.L., 
Hammer, M.F., 2001. Paternal Population History of East Asia: Sources, 
Patterns, and Microevolutionary Processes. Am J Hum Genet 69, 615-628. 


Karafet, T.M., Mendez, F.L., Meilerman, M.B., Underhill, P.A., Zegura, S.L., 
Hammer, M.F., 2008. New binary polymorphisms reshape and increase 
resolution of the human Y chromosomal haplogroup tree. Genome Res 18, 
830-838. 


Karve I, 1961. Hindu Socciety-An interpretation. Deshmukh Prakashan, Poona. 
Karve, I.K., 1968. Kinship organization in India. Asia Pub. House. 


Kavitha Mary Selvam V.J, 2008. Studies on the genomic diversity of Southern Indian 
breeding isolates. 


Kayser, M., Caglia, A., Corach, D., Fretwell, N., Gehrig, C., Graziosi, G., Heidorn, F., 
Herrmann, S., Herzog, B., Hidding, M., Honda, K., Jobling, M., Krawczak, 
M., Leim, K., Meuser, S., Meyer, E., Oesterreich, W., Pandya, A., Parson, W., 
Penacino, G., Perez-Lezaun, A., Piccinini, A., Prinz, M., Schmitt, C., Roewer, 
L., 1997. Evaluation of Y-chromosomal STRs: a multicenter study. Int. J. 
Legal Med. 110, 125-133, 141-149. 


Kayser, M., Roewer, L., Hedman, M., Henke, L., Henke, J., Brauer, S., Kriger, C., 
Krawezak, M., Nagy, M., Dobosz, T., Szibor, R., de Knijff, P., Stoneking, M., 
Sajantila, A., 2000. Characteristics and Frequency of Germline Mutations at 
Microsatellite Loci from the Human Y Chromosome, as Revealed by Direct 
Observation in Father/Son Pairs. The American Journal of Human Genetics 
66, 1580-1588. 


Khatri A P, 1962. Origin and development of Series II culture in India. Proc. Prehist. 
Soc. 28, 191-208. 


Kivisild, T., Rootsi, S., Metspalu, M., Mastana, S., Kaldma, K., Parik, J., Metspalu, 
E., Adojaan, M., Tolk, H.-V., Stepanov, V., Gdlge, M., Usanga, E., Papiha, 
S.S., Cinnioglu, C., King, R., Cavalli-Sforza, L., Underhill, P.A., Villems, R., 
2003. The Genetic Heritage of the Earliest Settlers Persists Both in Indian 
Tribal and Caste Populations. Am J Hum Genet 72, 313-332. 


Kosambi D D, 1991. The culture and civilisation of ancient India in historical outline. 
Vkas Publishing House, New Dehli. 


142 


Krishnamurti, B., 2003. The Dravidian Languages. Cambridge University Press. 


Krithika, S., Maji, S., Vasulu, T.S., 2009. A microsatellite study to disentangle the 
ambiguity of linguistic, geographic, ethnic and genetic influences on tribes of 
India to get a better clarity of the antiquity and peopling of South Asia. Am. J. 
Phys. Anthropol. 139, 533-546. 


Kruskal, J. B., 1964. Multidimensional scaling by optimizing goodness of fit to a 
nonmetric hypothesis. Psychometrika 29, 1—27. 


Kumar, V., Reddy, A.N., Babu, J.P., Rao, T.N., Langstieh, B.T., Thangaraj, K., 
Reddy, A.G., Singh, L., Reddy, B.M., 2007. Y-chromosome evidence suggests 
a common paternal heritage of Austro-Asiatic populations. BMC Evolutionary 
Biology 7, 47. 

Lacau, H., Gayden, T., Regueiro, M., Chennakrishnaiah, S., Bukhari, A., Underhill, 
P.A., Garcia-Bertrand, R.L., Herrera, R.J., 2012. Afghanistan from a Y- 


chromosome perspective. European Journal of Human Genetics 20, 
1063-1070. 


Lahn, B.T., Pearson, N.M., Jegalian, K., 2001. The human Y chromosome, in the light 
of evolution. Nature Reviews Genetics 2, 207—216. 


Lal, R. (August 2001). "Thematic evolution of ISTRO: transition in scientific issues 
and research focus from 1955 to 2000". Soil and Tillage Research 61 (1-2): 
3-12 [3]. doi:10.1016/S0167-1987(01)00184-2 


Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., et al., 2001. Initial 
sequencing and analysis of the human genome. Nature 409, 860-921. 


Leshnik, L.S., Junghans, K.H., 1968. The Harappan “Port” at Lothal: Another View1. 
American Anthropologist 70, 911-922. 


Liversage, D, 1989. The spread of food production into India—Some questions of 
interpretation, In K. Frifelt & P. Sorensen (Eds.). ed, South Asian archaeology 
1985. Curzon Press, London. 


Lone, F.A., Khan, M., Buth, G.M., 1993. Palaeoethnobotany: plants and ancient man 
in Kashmir. Oxford & IBH Pub. Co. 


Luca L. Cavalli-Sforza, 1988. The basque population and ancient migrations in 
Europe. MUNIBE (Antropologia y Arqueologia) 129-137. 


Majumder, P.P., 1998. People of India: Biological diversity and affinities. 
Evolutionary Anthropology: Issues, News, and Reviews 6, 100-110. 


143 


Mallory J.P, 1989. In Search of the Indo-Europeans; Language Archaeology and 
Myth, Thames and Hudson Ltd. ed. London. 


McAlpin DW, 1981. Proto-Elamo-Dravidian: The evidence and its implications. 
Trans Am Philos Soc 71, 3-155. 


McElreavey, K., Quintana-Murci, L., 2005. A population genetics perspective of the 
Indus Valley through uniparentally-inherited markers. Ann. Hum. Biol. 32, 
154-162. 


Menozzi Paolo, Cavalli-Sforza, Piazza Alberto, 1994. The History and Geography of 
Human Genes. Princeton University Press, Princeton, NJ. 


Mishra, S, 1992. The Age of the Acheulian in Indiall: New Evidence. Current 
Anthropology 33, 325-328. 


Misra V. N, 1987. Middle Pleistocene adaptations in India; in The Pleistocene Old 
World:Regional Perspectives, Olga Soffer. ed. Plenum press, New York. 


Misra, V.N., 2001. Prehistoric human colonization of India. J. Biosci. 26, 491-531. 
Mohyuddin, A., Ayub, Q., Underhill, P.A., Tyler-Smith, C., Mehdi, S.Q., 2006a. 


Detection of novel Y SNPs provides further insights into Y chromosomal 
variation in Pakistan. Journal of Human Genetics 51, 375-378. 

Mullis, K.B., 1990. The unusual origin of the polymerase chain reaction. Sci. Am. 
262, 56-61, 64-65. 

Murty M L K, 1966. Stone Age Cultures of Chittoor District, Andhra Pradesh (Ph.D. 


Dissertation). 


Murty, M.L.K., 1989. Pre-And Protohistoric Andhra Pradesh Up To 500 Bc. Orient 
Blackswan. 


Nasidze, I., Sarkisian, T., Kerimov, A., Stoneking, M., 2003. Testing hypotheses of 
language replacement in the Caucasus: evidence from the Y-chromosome. 
Hum. Genet. 112, 255-261. 


Nanavutty, P (1970), The Parsis, New Delhi: National Book Trust 

Nebel, Almut., Ella, Landau-Tasseron., Dvora. Filon., Ariella, 
Oppenheim., and Marina, Faerman.2002. Genetic evidences for the expansion 
of Arabian tribes into the southern Levant and North Africa. Am J Hum Genet 
70, 6 


Nei M, 1987. Molecular Evolutionary Genetics. Columbia University Press, New 
York, NY, USA. 


144 


Nene, Y. L., Rice Research in South Asia through Ages, Asian Agri-History Vol. 9, 
No. 2, 2005 (85-106) 


Obert R, 2005. Fischer exact, excel addin. (Online). 


Olsvik, O., Wahlberg, J., Petterson, B., Uhlén, M., Popovic, T., Wachsmuth, I.K., 
Fields, P.I., 1993. Use of automated sequencing of polymerase chain reaction- 
generated amplicons to identify three types of cholera toxin subunit B in 
Vibrio cholerae O1 strains. J. Clin. Microbiol. 31, 22-25. 


Ota, T., Kimura, M., 1973. A model of mutation appropriate to estimate the number 
of electrophoretically detectable alleles in a finite population. Genet. Res. 22, 
201-204. 


Paddayya, K., 1973. Investigations Into the Neolithic Culture of the Shorapur Doab, 
South India. BRILL. 


Parpola, A., 1994. Deciphering the Indus Script. Cambridge University Press. 


Petraglia, M., Korisettar, R., Boivin, N., Clarkson, C., Ditchfield, P., Jones, S., Koshy, 
J., Lahr, M.M., Oppenheimer, C., Pyle, D., Roberts, R., Schwenninger, J.-L., 
Arnold, L., White, K., 2007. Middle Paleolithic Assemblages from the Indian 
Subcontinent Before and After the Toba Super-Eruption. Science 317, 
114-116. 


Petraglia, Ravi Korisettar, Michael D., 1998. Early Human Behaviour in Global 
Context: The Rise and Diversity of the Lower Paleolithic Period. Routledge. 


Pitchappan, R.M., 1998. Diversity And Dynamics Of Populations And Disease 
Susceptibility. In: The proceedings of the DAE symposium, PGIBMS 
Taramani, Chennai, India. 


Pitchappan, R.M., 2002. Castes, migration, immunogenetics and infectious diseases in 
south India. Community Genet 5, 157-161. 


Possehl, G.L., 1990. Revolution in the Urban Revolution: The Emergence of Indus 
Urbanization. Annual Review of Anthropology 19, 261—282. 


Possehl, G.L., 1996. Indus Age: the writing system. University of Pennsylvania Press. 


Pritchard Jonathan K, n.d. Inference of Population Structure Using Multilocus 
Genotype Data. 

Qamar, R., Ayub, Q., Mohyuddin, A., Helgason, A., Mazhar, K., Mansoor, A., Zerjal, 
T., Tyler-Smith, C., Mehdi, $.Q., 2002. Y-chromosomal DNA variation in 
Pakistan. Am. J. Hum. Genet. 70, 1107-1124. 


145 


Quintana-Murci, L., Krausz, C., Zerjal, T., Sayar, S.H., Hammer, M.F., Mehdi, S.Q., 
Ayub, Q., Qamar, R., Mohyuddin, A., Radhakrishna, U., Jobling, M.A., Tyler- 
Smith, C., McElreavey, K., 2001. Y-Chromosome Lineages Trace Diffusion 
of People and Languages in Southwestern Asia. Am J Hum Genet 68, 
537-542. 


R.Development.Core.Team, 2010. R: A language and environment for statistical 


computing. 


Ramana, G.V., Su, B., Jin, L., Singh, L., Wang, N., Underhill, P., Chakraborty, R., 
2001. Y-chromosome SNP haplotypes suggest evidence of gene flow among 
caste, tribe, and the migrant Siddi populations of Andhra Pradesh, South India. 


European Journal of Human Genetics 9, 695-700. 


Reddy, B.M., Langstich, B.T., Kumar, V., Nagaraja, T., Reddy, A.N.S., Meka, A., 
Reddy, A.G., Thangaraj, K., Singh, L., 2007. Austro-Asiatic Tribes of 
Northeast India Provide Hitherto Missing Genetic Link between South and 
Southeast Asia. PLoS ONE 2, e1141. 


Rendell, H.M., 1989. Pleistocene and Palaeolithic Investigations in the Soan Valley, 
Northern Pakistan (British Archaeological Reports (BAR) International). 
British Archaeological Reports. 


Renfrew C, 1996. Languages families and the spread of farming, In: Harris DR (ed) 
The origins and spread of agriculture and pastoralism in Eurasia. ed. 
Smithsonian Institution Press, Washington, DC. 


Renfrew, C., 1988. Archaeology and language: the puzzle of Indo-European origins. 
J. Cape. 


Roewer, L., Krawezak, M., Willuweit, S., Nagy, M., Alves, C., Amorim, A., 
Anslinger, K., Augustin, C., Betz, A., Bosch, E., Caglia, A., Carracedo, A., 
Corach, D., Dekairelle, A.F., Dobosz, T., Dupuy, B.M., Fiiredi, S., Gehrig, C., 
Gusma6, L., Henke, J., Henke, L., Hidding, M., Hohoff, C., Hoste, B., Jobling, 
M.A., Kargel, H.J., de Knijff, P., Lessig, R., Liebeherr, E., Lorente, M., 
Martinez-Jarreta, B., Nievas, P., Nowak, M., Parson, W., Pascali, V.L., 
Penacino, G., Ploski, R., Rolf, B., Sala, A., Schmidt, U., Schmitt, C., 
Schneider, P.M., Szibor, R., Teifel-Greding, J., Kayser, M., 2001. Online 
reference database of European Y-chromosomal short tandem repeat (STR) 
haplotypes. Forensic Sci. Int. 118, 106-113. 


Rodda & Ubertini (2004). The Basis of Civilization--water Science?. International 
Association of Hydrological Science. 279. ISBN 1-901502-57-0. 


146 


Rootsi, S., Myres, N.M., Lin, A.A., Jarve, M., King, R.J., Kutuev, I., Cabrera, V.M., 
Khusnutdinova, E.K., Varendi, K., Sahakyan, H., Behar, D.M., Khusainova, 
R., Balanovsky, O., Balanovska, E., Rudan, P., Yepiskoposyan, L., 
Bahmanimehr, A., Farjadian, S., Kushniarevich, A., Herrera, R.J., Grugni, V., 
Battaglia, V., Nici, C., Crobu, F., Karachanak, S., Kashani, B.H., Houshmand, 
M., Sanati, M.H., Toncheva, D., Lisa, A., Semino, O., Chiaroni, J., Cristofaro, 
J.D., Villems, R., Kivisild, T., Underhill, P.A., 00/01/0000. Distinguishing the 
co-ancestries of haplogroup G Y-chromosomes in the populations of Europe 
and the Caucasus. European Journal of Human Genetics. 


Rosser, Z.H, Zerjal, T., Hurles, ME., Adojaan, M., Alavantic, D., Amorim A, Amos 
W, Armenteros M, Arroyo E, Barbujani G, Beckman G, Beckman L, 
Bertranpetit J,Bosch E, Bradley DG, Brede G, Cooper G, Cérte-Real HB, de 
Knijff P, Decorte R, Dubrova YE, Evgrafov O, Gilissen A, Glisic 8, Gélge M, 
Hill EW, Jeziorowska A,Kalaydjieva L, Kayser M, Kivisild T, Kravchenko 
SA, Krumina A, Kucinskas V, Lavinha J, Livshits LA, Malaspina P, Maria S, 
McElreavey K, Meitinger TA, Mikelsaar AV, Mitchell RJ, Nafa K, Nicholson 
J, Norby S, Pandya A, Parik J, Patsalis PC, Pereira L, Peterlin B, Pielberg G, 
Prata MJ, Previderé C, Roewer L, Rootsi S,Rubinsztein DC, Saillard J, Santos 
FR, Stefanescu G, Sykes BC, Tolun A, Villems R, Tyler-Smith C, Jobling 
MA., 2000. Y-chromosomal diversity in Europe is clinal and influenced 
primarily by geography, rather than by language. Am J Hum Genet 67, 
1526-43. 


SAGA Development Team, 2008. System for Automated Geoscientific Analyses 
(SAGA GIS). 

Sahoo, S., Singh, A., Himabindu, G., Banerjee, J., Sitalaximi, T., Gaikwad, S., 
Trivedi, R., Endicott, P., Kivisild, T., Metspalu, M., Villems, R., Kashyap, 
V.K., 2006. A prehistory of Indian Y chromosomes: Evaluating demic 
diffusion scenarios. PNAS 103, 843-848. 


Saitou, N., Nei, M., 1987. The neighbor-joining method: a new method for 
reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406-425. 


Sankalia H D, 1956. Animal fossils and Palaeolithic industries from the Pravara basin 
at Nevasa, district Ahmednagar. Ancient India 12, 35—56. 


Saraswat, K.S, 1993. Plant economy of Late Harappans at Hulas. Purattatva 23, 1-12. 


Sastri Nilakanta, 1975. A history of South India, Fourth. ed. Oxford University Press, 
Madras. 


147 


Semino, O., Passarino, G., Oefner, P.J., Lin, A.A., Arbuzova, S., Beckman, L.E., De 


Benedictis, G., Francalacci, P., Kouvatsi, A., Limborska, S., Marcikiae, M., 
Mika, A., Mika, B., Primorac, D., Santachiara-Benerecetti, A.S., Cavalli- 
Sforza, L.L., Underhill, P.A., 2000. The genetic legacy of Paleolithic Homo 
sapiens sapiens in extant Europeans: a Y chromosome perspective. Science 
290, 1155-1159. 


Semino, O., Magri, C., Benuzzi, G. Lin, AA., Al-Zahery, N., Battaglia, V., Maccioni 


L., Triantaphyllidis, C., Shen, P., Oefner, P.J., Zhivotovsky, 
L.A., King.R,, Torroni, A., Cavalli-Sforza,L.L., Underhill. P.A,, Santachiara- 
Benerecetti,A.S Origin, diffusion, and differentiation of Y-chromosome 
haplogroups E and J: inferences on the neolithization of Europe and later 
migratory events in the Mediterranean area. 2004. Am J Hum Genet 74 , 
1023-34. 


Sengupta, S., Zhivotovsky, L.A., King, R., Mehdi, S.Q., Edmonds, C.A., Chow, C.- 


Shah 


E.T., Lin, A.A., Mitra, M., Sil, S.K., Ramesh, A., Usha Rani, M.V., Thakur, 
C.M., Cavalli-Sforza, L.L., Majumder, P.P., Underhill, P.A., 2006. Polarity 
and Temporality of High-Resolution Y-Chromosome Distributions in India 
Identify Both Indigenous and Exogenous Expansions and Reveal Minor 
Genetic Influence of Central Asian Pastoralists. Am J Hum Genet 78, 
202-221. 


A.M, Rakesh Tamang, Priya Moorjani, Deepa Selvi Rani, Periyasamy 
Govindaraj,, Gururaj Kulkarni, Tanmoy Bhattacharya,, Mohammed S, 
Mustak, Bhaskar L V S K, Ala G Reddy, ChrisTyler-Smith, Lalji Singh, 
Kumarasamy Thangaraj, 2011. Indian Siddis: African Descendants with 
Indian Admixture. Am J Hum Genet. 


Shanmugalakshmi, S., Balakrishnan, K., Manoharan, K., Pitchappan, R.M., 2003. 


HLA-DRBI1*, -DQB1* in Piramalai Kallars and Yadhavas, two Dravidian- 
speaking castes of Tamil Nadu, South India. Tissue Antigens 61, 451-464. 


Sharma G R, Misra V D, Mandal D, Misra B B, Pal J N, 1980. Beginnings of 


Agriculture (Epipalaeolithic to Neolithic): Excavations at Chopani-Mando, 
Mahadaha and Mahagara. Abinash Prakashan, Allahabad. 


Sharma, G., Tamang, R., Chaudhary, R., Singh, V.K., Shah, A.M., Anugula, S., Rani, 


D.S., Reddy, A.G., Eaaswarkhanth, M., Chaubey, G., Singh, L., Thangaraj, K., 
2012. Genetic Affinities of the Central Indian Tribal Populations. PLoS ONE 
7, €32546. 


148 


Sharma, S., Rai, E., Sharma, P., Jena, M., Singh, S., Darvishi, K., Bhat, A.K., 
Bhanwer, A.J.S., Tiwari, P.K., Bamezai, R.N.K., 2009. The Indian origin of 
paternal haplogroup Rlal* substantiates the autochthonous origin of 
Brahmins and the caste system. J. Hum. Genet. 54, 47-55. 


Shi, H., Zhong, H., Peng, Y., Dong, Y.-L., Qi, X.-B., Zhang, F., Liu, L.-F., Tan, S.-J., 
Ma, R.Z., Xiao, C.-J., Wells, R.S., Jin, L., Su, B., n.d. Y chromosome 
evidence of earliest modern human settlement in East Asia and multiple 
origins of Tibetan and Japanese populations. BMC Biology 6, 45. 


Siddiqui, M.R., Meisner, S., Tosh, K., Balakrishnan, K., Ghei, S., Fisher, S.E., 
Golding, M., Shanker Narayan, N.P., Sitaraman, T., Sengupta, U., Pitchappan, 
R., Hill, A.V., 2001. A major susceptibility locus for leprosy in India maps to 
chromosome 10p13. Nat. Genet. 27, 439-441. 


Singh, K.S., 1992. People of India: An Introduction. South Asia Books. 


Skaletsky, H., Kuroda-Kawaguchi, T., Minx, P.J., Cordum, H.S., Hillier, L., Brown, 
L.G., Repping, S., Pyntikova, T., Ali, J., Bieri, T., Chinwalla, A., Delehaunty, 
A., Delehaunty, K., Du, H., Fewell, G., Fulton, L., Fulton, R., Graves, T., Hou, 
S.-F., Latrielle, P., Leonard, S., Mardis, E., Maupin, R., McPherson, J., Miner, 
T., Nash, W., Nguyen, C., Ozersky, P., Pepin, K., Rock, S., Rohlfing, T., 
Scott, K., Schultz, B., Strong, C., Tin-Wollam, A., Yang, S.-P., Waterston, 
R.H., Wilson, R.K., Rozen, S., Page, D.C., 2003. The male-specific region of 
the human Y chromosome is a mosaic of discrete sequence classes. Nature 
423, 825-837. 


Slatkin, M., Hudson, R.R., 1991. Pairwise comparisons of mitochondrial DNA 
sequences in stable and exponentially growing populations. Genetics 129, 
555-562. 


Southworth F, 2005. Linguistic Archaeology of South AsiaLinguistic Archaeology of 
South Asia. Routledge-Curzon, London. 


Su, B., Xiao, C., Deka, R., Seielstad, M., Kangwanpong, D., Xiao, J., Lu, D., 
Underhill, P., Cavalli-Sforza, L., Chakraborty, R., Jin, L., 2000. Y 
chromosome haplotypes reveal prehistorical migrations to the Himalayas. 
Human Genetics 107, 582-590. 


Su, B., Xiao, J., Underhill, P., Deka, R., Zhang, W., Akey, J., Huang, W., Shen, D., 
Lu, D., Luo, J., Chu, J., Tan, J., Shen, P., Davis, R., Cavalli-Sforza, L., 
Chakraborty, R., Xiong, M., Du, R., Oefner, P., Chen, Z., Jin, L., 1999. Y- 
Chromosome Evidence for a Northward Migration of Modern Humans into 
Eastern Asia during the Last Ice Age. Am J Hum Genet 65, 1718-1724. 


149 


Tamura K, Dudley J, Nei M, Kumar S, 2007. Molecular Evolutionary Genetics 
Analysis (MEGA) software version 4.0 24, 1596-1599. 


Terra, H. de, Paterson, T.T., 1939. Studies on the Ice Age in India and Associated 
Human Cultures, by H. de Terra,... and T. T. Paterson... [Paul D. Kryninell: 
Petrology of the Karewa Lake Beds.]. Carnegie Institution of Washington. 


Thangaraj, K., Naidu, B.P., Crivellaro, F., Tamang, R., Upadhyay, S., Sharma, V.K., 
Reddy, A.G., Walimbe, S.R., Chaubey, G., Kivisild, T., Singh, L., 2010. The 
Influence of Natural Barriers in Shaping the Genetic Structure of Maharashtra 
Populations. PLoS ONE 5, e15283. 


Thangaraj, K., Ramana, G.V., Singh, L., 1999. Y-chromosome and mitochondrial 
DNA polymorphisms in Indian populations. Electrophoresis 20, 1743-1747. 


Thanseem, I., Thangaraj, K., Chaubey, G., Singh, V.K., Bhaskar, L.V., Reddy, B.M., 
Reddy, A.G., Singh, L., 2006. Genetic affinities among the lower castes and 
tribal groups of India: inference from Y chromosome and mitochondrial DNA. 
BMC Genetics 7, 42. 


The State recognition Act, 1956. Central Government Act. 


Tosh, K., Meisner, S., Siddiqui, M.R., Balakrishnan, K., Ghei, S., Golding, M., 
Sengupta, U., Pitchappan, R.M., Hill, A.V.S., 2002. A region of chromosome 
20 is linked to leprosy susceptibility in a South Indian population. J. Infect. 
Dis. 186, 1190-1193. 


Trautmann T.R, 1981. Dravidian kinship., Cambridge Studies in Social 
Anthropology,36. Cambridge University Press, Cambridge. 


Trivedi R, Sahoo S., Singh A, Bindu G, Tandom M, Gaikwad S, 2008. Genetic 
imprints of Pleistocene origin of Indian populations: A comprehensive 
phylogeographic sketch of Indian Y-chromosomes. International Journal of 
Human Genetics 8, 97-118. 


Underhill, P.A., Jin, L., Lin, A.A., Mehdi, S.Q., Jenkins, T., Vollrath, D., Davis, 
R.W., Cavalli-Sforza, L.L., Oefner, P.J., 1997. Detection of Numerous Y 
Chromosome Biallelic Polymorphisms by Denaturing High-Performance 
Liquid Chromatography. Genome Res 7, 996-1005. 


Underhill, P.A., Passarino, G., Lin, A.A., Shen, P., Mirazon Lahr, M., Foley, R.A., 
Oefner, P.J., Cavalli-Sforza, L.L., 2001. The phylogeography of Y 
chromosome binary haplotypes and the origins of modern human populations. 
Annals of Human Genetics 65, 43-62. 


150 


Underhill, P.A., Shen, P., Lin, A.A., Jin, L., Passarino, G., Yang, W.H., Kauffman, E., 
Bonné-Tamir, B., Bertranpetit, J., Francalacci, P., Ibrahim, M., Jenkins, T., 
Kidd, J.R., Mehdi, S.Q., Seielstad, M.T., Wells, R.S., Piazza, A., Davis, R.W., 
Feldman, M.W., Cavalli-Sforza, L.L., Oefner, P.J., 2000. Y chromosome 
sequence variation and the history of human populations. Nat. Genet. 26, 
358-361. 


Wakankar, V.S., Brooks, R.R.R. (Robert R.R., 1976. Stone age painting in India / 
Vishnu S. Wakankar, Robert R.R. Brooks. D.B. Taraporevala, Bombay. 


Weale, M., Yepiskoposyan, L., Jager, R., Hovhannisyan, N., Khudoyan, A., Burbage- 
Hall, O., Bradman, N., Thomas, M., 2001. Armenian Y chromosome 
haplotypes reveal strong regional structure within a single ethno-national 
group. Human Genetics 109, 659-674. 


Weber, S.A., 1991. Plants and Harappan subsistence: An example of stability and 
change from Rojdi. 


Wells, R.S., Yuldasheva, N., Ruzibakiev, R., Underhill, P.A., Evseeva, I., Blue-Smith, 
J., Jin, L., Su, B., Pitchappan, R., Shanmugalakshmi, S., Balakrishnan, K.., 
Read, M., Pearson, N.M., Zerjal, T., Webster, M.T., Zholoshvili, I., 
Jamarjashvili, E., Gambarov, S., Nikbin, B., Dostiev, A., Aknazarov, O., 
Zalloua, P., Tsoy, I., Kitaev, M., Mirrakhimov, M., Chariev, A., Bodmer, 
W.F., 2001. The Eurasian Heartland: A continental perspective on 
Y-chromosome diversity. PNAS 98, 10244-10249. 


Wells, S., Read, M., 2002. The Journey of Man: A Genetic Odyssey. Princeton 
University Press. 


Willey, G.R., Phillips, P., Lyman, R.L., O’Brien, M.J., 2001. Method and theory in 
American archaeology. University of Alabama Press. 


Wilson Ian J., Weale Michael E., Balding David J., 2003. Inferences from DNA Data: 
Population Histories, Evolutionary Processes and Forensic Match 
Probabilities. Journal of the Royal Statistical Society. Series A (Statistics in 
Society) 166, 155-201. 


Witzel M, 2005. Central Asian roots and acculturation in Indian subcontinent: 
linguistic and archaeological evidence from Western Central Asia, the 
Hindukush and northwestern Indian subcontinent for early Indo-Aryan 
language and religion.In Liguistics, Archaeology and the Human Past, Osada 
T. ed. Research Institute for Humanity and Nature, Kyoto. 


ikea | 


Xue, Y., Zerjal, T., Bao, W., Zhu, S., Shu, Q., Xu, J., Du, R., Fu, S., Li, P., Hurles, 
M.E., Yang, H., Tyler-Smith, C., 2006. Male Demography in East Asia: A 
North-South Contrast in Human Population Expansion Times. Genetics 172, 
2431-2439. 


YFiler.ABI.Online.Protocol, 2012. . 


Zalloua, P.A., Xue, Y., Khalife, J., Makhoul, N., Debiane, L., Platt, D.E., Royyuru, 
A.K., Herrera, R.J., Hernanz, D.F.S., Blue-Smith, J., Wells, R.S., Comas, D., 
Bertranpetit, J., Tyler-Smith, C., 2008. Y-Chromosomal Diversity in Lebanon 
Is Structured by Recent Historical Events. Am J Hum Genet 82, 873-882. 


Zeuner F E, Allchin B, 1956. The microlithic sites of Tinnevelly district, Madras 
State. Ancient India 12, 4-20. 


Zhivotovsky, L.A., Underhill, P.A., Cinnioglu, C., Kayser, M., Morar, B., Kivisild, 
T., Scozzari, R., Cruciani, F., Destro-Bisol, G., Spedini, G., Chambers, G.K., 
Herrera, R.J.. Yong, K.K., Gresham, D., Toumev, I., Feldman, M.W., 
Kalaydjieva, L., 2004. The effective mutation rate at Y chromosome short 
tandem repeats, with application to human population-divergence time. Am. J. 
Hum. Genet. 74, 50-61. 


Zhong, H., Shi, H., Qi, X.-B., Xiao, C.-J., Jin, L., Ma, R.Z., Su, B., 2010. Global 
distribution of Y-chromosome haplogroup C reveals the prehistoric migration 
routes of African exodus and early settlement in East Asia. J. Hum. Genet. 55, 
428-435. 


152 


Ms Apuen 


c 


eipuy Jo Aaaing jeorsojodorquy ‘eT weyog espuoley ‘y8urg ysomng Jeu 


I 


201d “JO|[F St 


Prey pooy a]de1g 
ayy Jo Sursaq “sjeumrue 
1124 oy) erjeq Bye} peop Jo usAd 
JO} powtojsod yeyoueneg ‘eASeg} —- yeour “Ys sopore 
sak pomoyye| pomoyye st efndiyeyg ‘qeaurjiyeg| — ssoppop -30A UON sok ooquieg oyeur L SPTTOUA, oIpewioN, eyemoy| ZT 
sraqny ‘s}oo1 
[eyouemeg ‘remofl ‘yeour Suuoyyes 
pomoyye| pamoyye ‘qeourpLyeg TAapiny} = -3a,A ON pooy pue Suyuny L ma dN vaeseAl [I 
aonpoud }so103 uo eieg Jorey 
ansuo} spofqe ary ‘Arepueqsny ‘JOTEH ‘IpeasenN, 
‘AuowlaI99 Teyosenyeg] [eutue ‘qop| sasqnd ‘or yeunrue ‘ueSnger] jeyeuryourg 
pomoyye| pamoyye Suruen ‘qeourpiyeg] eqeq pop ‘Soa UON, Sok nerefn5 nerefn5 ‘oInNoUsy L TOY emyyeY, ‘ndrepy eBoy ‘eporeg eaey| OT 
Auowie199 reSeucelia] eyjueyeqes 
Surmeu jeyouenyeg n srnoqry ‘ereqaes ‘yonreyg 
sok pomoyye| pomoyye ‘omyoueg ‘jeourpineg Sok yereiny| erefnosmpereypy Jemnynousy L Leyes enyseseyeyy | ‘Teqyor ‘josey ‘yeng eipomey] 6 
“tifeq ‘oye}0d 
‘uoruo sprodoyg 
jeyouenneg Sym saying ‘stodaoy 
Sok pemore) pemorye ‘Teoureyed “BoA UON SOA TYPUIS/TpONy] THRO oIpeaon, L TapueyA “esepqy: fuurg}] yoy “ID ueypren| 8 
epouy 
uourpesy ode] [IA ‘eyjueyseueg 
Auouras99 jeyouenneg ‘omngjnonse sJepyeg‘ jared “euesoyoy|l sreqes 
ou pomoyye| pamoyye ainsuo], ‘jeourpineg uelIEjo30 A, Sok nerefn5 nerefn5 “IOLUE AY fe) eueluy eyuy ‘peqepounyy| Ted} ok 
Auowis199 yeyoreneg remof pue 
pomoyye Suruen ‘qeourpeneg 011 ‘S9A UON nerefn5 nein aunjynousy e) [led TON you] 9 
seyueyLry ‘s 
wieyueyyiiy| uersejo3o~, Sok slopeiy, fe) mer} 
WPeuwiog LYS Jo 
somnjdynos opeyl 6) undwog unuyerg] =p 
qeseuAryg, 
SoA s}sISo]09UaH) ‘9 uereyo ueyjseley ‘peseuny| TARD € 
RUOUISIOO 
skoq Surpoay 
0} poutoyiad [82199 SI Ayrap 
st Auouraieo} = pu Surmeu yeyorenneg | [my ‘nuyst A, u 
pouqryosd Surpeomy ‘oInsuoL, ‘qeourpLyeg ‘eAlys| uelejoSo A, Sok eSeuraoq pooyjsoug 3) Tyo uruyerg] = Z 
sramoqey L TysqeH BYRIVUILYY Ip ‘peseune IPpIs T 
ulsnoy = jasepiseuras sjengi4 aq saureu 
asda opti 90.10AIP Ayxaqna sjengis yyarg sours ayuy sand pooy Pisses qd119g¢ asensueq uonedns99 yyjaseg| wuonkg suOHeI0] 13YIO uonRs07q] = OLNSIG uonendog ON'S 


yervfny Jo suoyerndod Apnyjs jo sjiejop s1ydeisouyyy :e] xipuoddy 


STRIFE 
TeMjOy=—byyoddius=A#+ ere Myo y—bayj dios pasensuey we 
UPIEIN=bpe 89 AU NPA Ag P=prEsqsusboaO.indNdS9T 


TRAMP “OH 


bYO 


= TMAA=19{Sx009/Ur 09°9[s008 syo0g//-dyy “erayseq| BITEMIOY al 
yea 
OSTEI=JPOQUNOT%BALSPA—D 7p “yer0yq 
SBedoUO=—\#OOMAV9O8AD0=PHABOCKHMBIIOIOP AP ‘NOH A 
=CDXYIN=OF X=PSBPUS=[YHOGII+ALSEA=DPBOTT Vd “Hyerearys uredoxo 
=Sdw DSI ATYNEAASP=PlL SYOOd UT 09°9[8003 syoog//-diyq uoneUterZ “IeIAGN oBeyIIA BARSEA Il 
Sur00}e} 
: eroyyd| 
dW |ssunured ‘tuefn ‘erosep 
Z Woy poywisIpl] eoyIg] ores) ‘Toy ‘Tyearq eBMTEY Or 
eayysereye yl 
Woy poreIsi Al feng BIpOyey 6 
asTy=SVe yseuruey 
TeMjOy—byyoddius—a#+ ele MIO y—bayj dios pasensuey +e euysiry 
UPIE=bpe DS A UNC AASP=prgsqsusbosO.indNdS91 Sump 
= TMA Al=10{Sy00q/Ur-09°9[3008'sy00q//:ciyYq yeung unojiod sof] Leyprey 8 
JOLNSIP uvasayeyl 
0} way) ‘uey\seley 
ur indjereyg 
0} TTUAYse yy 
Woy powers uoneUtar: ou [ved] L 
SSTRI=JHVELEIMITYT7ZFOOT YH OA=D 
adedou0=A#O EM VIOMADO=P9A79 T=Is HOCH PSUFV 
ANQOO AWG LAC ed=19 739 X= "SH US=[HPOMNOIUEIM TAD] 
ABOTTAJOOAXA=SISPA ADD HAXA\=S1029[9=90IN S79 IE 
BInd+J0+1[04=Dp¥E6IVd=94] 79 £69 Vd=977 DOM6MXA 
= AOp=Pizsyooq/ur-o9°ays005"syoog//:dyy UWOHeUIaI_) TOO 9 
Ayan 
uoneUra_, wu09 wer ¢ 
tndurog 
I ururye lg v 
1ayey) € 
Tygon 
I uoneulaID Urory eg) c 
fenng IPPIS I 
Sa 
: enjt uysnoo 
ddUI1IJIY A1O}STF] 10}Sad Uy |a}9e.1eYyD ead sjeansag | ysouid) s1ayO jojeaed uonepndog ON’S 


“fyeyyy‘saoyuses epaypy‘diysiom piempurlyyy 
yeurrue wWIO0FI0g‘(Ie9k yesoouy‘| Moueds pue {MIG IeYsTOA 
MoN)eMpedipny omjeu pue} sy9090 jdsoxo ‘mdespueyo 
Wee yy uoow ‘uns diysioj,| — sdoqe9 YSo]y Sox (VV) D105, sio1904}23-pooy ‘pesemy dA TyeAriewy snyloy] ZI 
Spuoy JO s}soLig 
‘sjsou Suyuny oyeyl ‘soporre popuen 
yeourpineg SOA UON Sox (Aq) twejoy] uspoom oyeu ‘ainyNoLsy “eypre ny Teunear yz Ievr‘7s wejoy] [1 
Toy ‘TTemig “eloseq ysod‘ya0q yerefny Jo prseA, 
‘eAsvarume ng ‘ejOd ou -30A “enysereyeyy 
‘tureyoued Sey ‘exyel| UON ‘reMel] SuLIeal o7}129 ‘spoom jo ojnyq 
[ep ‘mosues ‘oH ‘Tep ‘9011 (AD TeA| _SuNIMD ‘uoreartno Apped ‘“yIs@N ‘oueyL LeS‘19'r THEA] OT 
“SBN ‘oap-oeq OU-30A ON 
‘ovmeyg ‘eqopueyy} ‘eA “UT[seu yrysen, 
fayetpry “ysereg “laop wress Tey] “remof ‘oory (aD) ewyoyy voreanqns ysno[d yerefn ‘oueyy ‘ernyqd 600°¢9°Z BUIOM] 6 
pure ieurter JeaA] ‘Aayuour sey Suryeul youg “‘Surysty “aiquio |, “fersoyy yrysen, 
“eseyy “eA[Snyy uMolg pue ‘Sulioyjes Jojuny “Suryew ‘adsey ‘uorSo[epy ‘sung ‘WIsuRjeY 
Temp “TOY ‘Ajorp OBeT[LA-Iquiry| _Joeq OU -uOg Sox (aD Hepey| TeooreyD “Supyeur nyooye) ‘opoyseyy ‘ouryL ‘pesiey CO9PLT sueyey] 8 
sox suey| ZL 
Tequieyey ‘oyed 
‘ZOIARN poysuer soy duro} 
“‘osa0q ON 1s0y}IeZ oxy Ur drysi0 nerefny Aq 
[eS pepsloy oyerqo]aD ‘epzeyy inyzy SOA UON Sox pooeydas sosieg SJopei] pue SIOAROM qerefny Aequog 000‘06 sosieg| 9 
Tueaeyg ‘eqopueyy| as—nd qeoym 
-AJOIP JTRI ‘SOA UON (aD typereyy SIOLUG AA seypereyl| ¢ 
eiosep pue 
BUISTYS/IJOH ‘s10jsooue 
“eqomeyg ‘eqepueyy SIOABOM 
“eAop eilg drysio\ eqopuryy Sox (aD tyyereyy oom ‘siorem ‘spredays jeqniny ‘eyjoH S6S‘°SE sieSueyg| 
lemysosor SoA (aD) Tyee S|SoLg uevardyyD urmyeig} = ¢ 
Ayop Ayrurey ifejeg “eqopueyy ueliejoson 
diysio M “yeIAeN, “Tyyeareg oe] (aD typereyy S|SoLg oung ‘indeyjoy eyyseysog urmyeig] 7 
KoIO0S sroroyyes BSSHIO 
Teyapned (aq) 1puoyH Joyuny syuesead ‘sr9yn1 “dA enysereyeyy spuosfey pue spuoy| | 
s[Banysa, aoue} Loy sana 100, waysks ue, asensue’ uonednss, shai suO]eI0] 13) uones0 INST ay phe uonejndo, ON" 
[BANSOT soquy ped pooy 4 1D 1 LY Oo wAuoukg RIOT YO Heo, PLYSIG uonendog 1002 Heyndod N’S 


BAQYseAvYL A] JO suoHelndod Apnjs Jo spejop s1ydeasouyyy :q] xipusddy 


suInIp os) uonoolp 
‘oourp reavyg yynos ur 
pur osoyyeyo] proy wim Aing sox snyioy] ZT 
rey 
SpTryo ou} Suraeyg 
oureuins outes} = aynyJ Ooqureq ‘KuoUlo199 
oy} UTM Arreur jou oq] ‘oyeddey, ‘wmnig Ang Surrey, Sok Sox Sox wejoy| IT 
sSunuted ye 
“‘souep [oud osieq JOqUIM ur o9R]d 
vIRURyY WO poyessI] ‘gourp edie], oyeudID -Auouwlol99 SurueN, oye} soseLueyyy Te} OT 
oyu} = [engi Surz0g eq PpoMoyTy euyoy| 6 
yerefny Woy soueuins 
aqin [Iyg Wor, poatiaq uryjiM Arreur jou Op suog}] souep ipureyq oyeUdI_D ON. suepey] g 
TOT ISUTOT FISTS 
suey Iepag ‘soquy 
viefueg 0} poyoene 
ore ssueul oyeyq ON ON suey] L 
0} samny[nA oy} 
JO} [JOM/aOUgTIS ueLyYyseioz 
Aequiog 0} voy) pue yerelny JO JOMO} 2q 0} pyIys ou 
‘uefes 0} uel] Woy powers] oy Jo uMoIy}) OJ poutroysad st 
“uoosop uRIsIog ore Aoy | sioXevid Aptep ¢ JOIZO ore sorpog| Auowto199 o}0fAeN, sosieg| 9 
siquny wo Sueidg oyRWOID pouqryoig pouqryoig seyjereyy] ¢ 
poring 
pure poydaj]oo ore Aureuwi9}ur 
vinyl ‘URARpUlI A, Angod} souoq pue soyse Auowi199 jou op 
Woy payessipy ‘yesrefny TAQ ‘oouerp]| == “Jung Jy “Anq oy} suojsod] ouweums oures 
TeMILYJLY Wo poywessry eles resueq pure oyewsID ummyeig yim o]doog siesueyq| + 
pouliojiod 
uefndnyseys st weueXeuerds ueaedyyD unuyeig| ¢ 
Teursyeur PIs oyeur 
jo s3uryqis Jo} powoysod 
Aurewl jou op ST weu 
oyeuldID sorpoa infe X eXeuednefunyy eyseysog urmyeig| 7 
spuosley pure spuoy| | 
SoseLuieu | sosumaeur 
ISULLAVUIT syenjit ON 
A.10}SIF{ 10)s990y S$.19)IBILYD LO soueUIng [engin engi yjeoqg sjengry yg SILI SIO ursnoo ursnoo IS1OI AIC ‘ uoyendog 
Pyeaeg ssolg OE Awoand S 


“1007 ‘TYP MAN ‘S9dIAIOg JBUOTBONPY URIs ‘eyoRsuLY yf ‘UOIsINYL Spy “eIpuy UIOYINOG Jo saqiy pue sayseQJoy 


ynsiey pue TuByuOy 
TeyoreLyey] HyEIOy “eXtreqqog ‘yanfueg jepeuuey StAog, syoustp idnpy, 
sox ‘qeourpLeyy ‘TouLog ‘eUENSUIE Ny soQ | epeuuey sony, Surysy ‘seyeyeIeY podiesey| pue ereueD ynog] —eooAedoy|ET 
TSS Jo one 
out ae pyryo 
peur oy} 
0} poutsosiod Auouwis199 BysereyRyy sumuyeig 
srw yepero nedeuey pue ue pure jesuog tpemsereg 
euefeurdy| pure Suruey eAing ‘lAoq ‘nuyst, ‘eAryg wistuelejasaa} = sax | Seuvaag Sunyueq S9\\ VOD] BYPIVUILY [e}SeO_D pnoy|z] 
oy} 18 prryo Suneanno 
yeu ay} ur syadxa ore nsepoy 
0} paursojiod pue amyynoyioy pure injseunyry9 
st ur Auouis199 pue amynouse “eSounys sunuyeig 
pouqyorg vuekeuedy) Surueu supiivjosoa joey] sa. |epeuuey| epeuUeY ART “Q10] IPI, ‘or0ye Suey eyehaey | TT 
uns pur 
pomoyly yeoulpeyy| — BNYQ se padrysiom auo}g} UOHeHOsau Aq are saBeLUe] S39 vsRIOy Anjoyseg eBSeIOy OT 
jo o8e oy) Auouwis199 wemdryouey ‘eujedeSueiig 
ye Auourass9 Surmeu yeourpined| pue wresuelig ‘eyeSueureseN 
pouqnyorg Surpeomp “euLeyEye S| pur yeyorenneg nuyst A, wistueejasaa} = sax] | IWR, qrue |, ‘nednity ‘OANA smeBuas]] 6 
yiod 
Auowis199 yeourined| “s1oysoue pue Jaq ou Ing UeIIR}aSIA sump 
“JENYL amy ON poauqyorg Sox Suruen ‘yeyorenyeg Troy} drysiom Aoyy, uoU pur yeayM ‘13eI ‘OOLI epeuuey epeuury| yeoq ‘omynousy eyeyeuey ipy| 8 
“WOUNOD pen Tae, 
st drysioM 1oysaouy * 1yeys Ur suBUMNy 
pue varys drysiom Aayy, jrequininy 
Auowia199 “U0}s UL dono AYSIU Y pure exyysereyeyy 
“JeENyL ay ON paiqryorg SX SurueN Surdrysiom “ar vyyeumyey epeuuey eqniny, yer0}sed ur siesueyq eqniny| / 
“feddeppis npeuprureL 
“JENYL O11y ON Auowis199 pue euurriseuey Jeaq pue ysopeig 
“oBeLUVUI oY JO spent yepero “eumuELepy ‘(sepmon pue yz1od ou yng ueLejadaA eIupuy, 
ou suroyiod ysorid ayy, sox pouqnyorg sox| pure Surumen qeyzemmeg BIYIPRBULY 104) BAIYS wou pur yeoyM ‘ISB ‘291 epeuuey epeuuey amynousy ‘eyeyeUIe yy epmMoy) 9 
sremof pur 18k ‘991 ST 
“JRooyL ed] pooy a]deys azayy, -y20q Jou Op 
Auourss99|pue yeourined| AY “Uoyoryo pue ysy ‘jour epeuuey sJ9Noge] yeu, 
“TENE omy ON sok pomorry Sok SurumeN ‘yeyoremed Spiod yea pur sueiiej}o80A vou [BARI X amynowse Bye “S100 Rawiox| ¢ 
“ysly pue wos ¢ 
“JeENaL arly ON ‘doays ‘uayoryo ‘y10d “120p olpeurou are 
“ysoud e Aq pouuojiod pourojiod Auowis199 saaJ} pue Sau0}s ‘Jooq ‘uosiq uo pagy Aoy, pue uonyeanno 
JOU a7 SaTUOUIS199 ay saX st odese,, Surueu ‘yerapined] drysiom Aayy ‘diysiom [opr “2011 puk ISBI SI pooy a]deys epeuuey Ipnu nuaf| Sungrys gjoyreseN ‘B1005| eqnimy nuop 
£ 
suonIpeEd 
Aovorjap yenzewr covey pue 
“JeNyL ay ON ON “yeruyined HoAey ‘sio}ss0uy] ® SI yiod ‘sueimja30 uo WPL eavepoy| spury pemyjnowse S100) PARPOY 
uifeueasq 
pue efuey equningy, 
“eyeyeg BYAO “epuny t 
(eArye) wTeom| nuoumutig ‘(eXAeuuayD, poo} a[deys are eindepuny 
pomoyty 9} S}IOYUI WOS S19]S1¢ pue oy) senyg| Ysy pue cor poyiog “SaA-uON| SoA | epruuey nny] — Surdde) Apooy, spaeyzq posiesey ‘dnp PARTI 
(dapn, 
‘oSprq syungq| wody Why9¢) 
94) an s}ung ereaLed ules pur vealed Ur (drysiom jesuns Joye Sunes vindepuny‘tdnpp, I 
‘oSpeq o8eieut qdaoxa (BAlye) yy[eomM| oyeus)oueyperesey‘diysiom| proae pue 3a aye syung uler epeuuey omynoyse “epeuury, 
94} 91) JOU Op spunq peN pomorry sok OU} SIIOYUL WOS SI9}SIG| Jo}soouR‘(e[OY vINYq) seNyg ‘Joog ON “SoA uou puke 3aA] — soA | epeuuey ony, “SIOIpJOS sreAeN podzesey BUTysyeq syung 
saseLlivul |saseLuieul 
° : asenieuiet spengit wie soureu suOnEI0] ByeyeUIEy . 
qsalig sO eae sie AopiAL astoaarg Sionng spengur wrg aoueyLayUy sang poog ydusg | adensueq uoyedns99 wiuouks peers en uoyyndog | oN’s 


sjiejop s1ydeisouy)y vyejyeuiey 27 xipueddy 


BIDDARTOYY 


JBIM/S10 eIpednyata/-dny oyeularg BIDOARSOY| CT 
uluyeig epnoy eyoueg 
ou) Suoure ou0 om suTUYyeIg eso, “Use yy 
miuyeig JeMseIeg pnoH| so qefung soddy Jo oAi 1yWBMsRILS JO SyuRQ OU] edoop eyryuey ‘uyeIAeN ‘TNyeyd surumyeig 
JDEM/SIO RIpednIA wa//-dyq] wo paar] oyM suTUTYEIg oy) 0} UBIO roy) wHTETD worewary eysouey ‘ourmmoYRIOON ‘Ipesn] wemsereg pnoy|z] 
“VIPUT YON UF BBYOIYY Woy poyesrur Loy 
(CV 09€-Spe) PULTEYseMALYY SuLy equIepey Aydosoyryd 
SUTUTpeIg -eyeAACH/o: Aq wopsury iseavueg 0} }y8noq sarrurey ByBAPY seAreyorieyueyS SUTUIYeIg 
edpas oo moqd untmyeIg//-diy omy Ayre JO syusosap oy} aq 0} UNTEID AOU, Ipy Moypoy woneuer) eyehaey| [1 
oouep pue aynyy ‘sumiq ayeuiay “THIMOY| pure twejyseyNyoH eBeIOH|01 
al Huepjueg pue res “eqqey 
TEES 0 efped is a /- Gay ouy ‘UyeaRy ‘uremdipeansry y, 
“yueder euysiry ‘Tpnumures| sieSuaxy] 6 
dAdd6 
~LOb-000C-WON/OT/ELLI/6 
OOTAURANS}Ig/adedsp/808-U 
F810 TypraeAprA ooedsp//-dny Byeyeurey Jo o]doad snoursapuy yeung “eASBARUTY pur eqqLy PUNE] vyRILUIEY IPy) 8 
“pes pue anep eUTUTeT[S K 
‘aner eremysopeyeyy ‘anes 
ereaoparoag ‘Huecer esepeyxeuey 
eqnimyy| ‘epeaed reyeursuay ‘ones ereakeq 
(PES eo pad Eps na /- CHG, Souep 3fOF BITUNY NOM TeLng| eppoq ‘earsjoyruey ‘anes ereyre equmy| 2 
TOATY CIPEYESUNTT, SPIEMOT IMTS SOUT OF COT 
GO EVLA WB EPunS Od SaMOONE OY, “SqeMEN pue T] eydndespuer 
=66E-L0°S0E-F-60- PAVIA] so powad oxy Bump soa venue, pe eSueT uoom ___ Rnuesefeavseg 
SAV -L0-000-F-60-WUW/49 MA suorar pur ueyjseley ‘ere{ny ‘qefung se yons pur ryuenues ‘eroseq ‘WeARuEUE 
~L00-000-000-0-60} — suorBax somo oy ayexBrut 0} paoroy o19M AaY_L “TYTOC. s]00} “eysyedniig ‘HYeaedaog ‘pes, 
“UV APUY-L/SUMOP-ZO/A] —_punose poaour pur suordox ueAepeumyy ayy ut Surat Temynouse “eqqey Ipueg “eqqry eumuELe 
oo sraysi[qndanp MMAM//:dyY] arama Ady. “OM ZOTE-000F 0} YORQ Sayep Aroysty ay pur soatuy yeung “‘nyseys eXuewresqns ‘nynued Ipy| epMoy 9 
add T| “81009 Jo sajsunf ayy] 
=ELF-E00C-WON/T/PS0T/6] 0} UT peords Kowp yoryM Woy vyeIoy Ul TeqLTe 
DoT AuRaNSTIq/aoedsp/ogog-u YNog Jo syNyR} at} JO ouO Jo uorTar peux Moe pur 
Ps10 TyprueApra ooedsp/-dyy WOJ SUISLIOge JUeITTUL 9q 0} pres ore Koy, sodid pue sump Avyd Aoyy, eSeoa ipnpusyo Avg Aog ‘ooquieg yeung npeyodeyrey pue Lown} BABIO S 
SPL 
SLUM “79 p-80-1 10-T-O1 
“Huy /S~eUM “79-80-10 “oORJ OY} UO Ap OOM [Tey 
-O1-HUV/Add-1SQV-80-000 jou saop pnur ayy Jey} Os JOUUEUT sjeUOSEIP v 
FI-OlANT V4 M-80-000-0-01| ‘andra yeqyng pue 8 ul 3np st punoad ayy “uonoartp ynos 
“PUY MPUY-L/SUMOf-ZO/M) —-BARTTV JO [Tey OY JOYE ysoroy 0} UI SuIprY YOO agruy ‘more ou ur pooeyd st peoy oy, “woRoetp yyNOs EpeIATYS vy 
oosIDysTqndany MMA//-dyy] seqniny suros yey} sayeys ArO}S UONRISTU [BIO OYL syejoyp Aeyd Kay, pue Moq yyou ay} ul poseyd st asdioo ayy “Aing} = pur LQ ‘nysig ‘Ipesy- ‘uyeIARN eqniny nusy| 
“Plog A191 oy eAepoy 
WOU OY} PAOD|OM SeALPOY Up JEU} Udy} sem J] “WrETST 
0} UT payaauod Apiqrosoy pure euyedesuemss 0} 31003 “UONRSIURSIO [PIDOS PABPOY € 
Jpqz| JO MO USALIP Sem pue UAIpP]IYD pue UsUOM ‘usUT Jo siseq ay} pouioy diysury ‘soueysip a[qesapisuos eB 
~“SELI-C00C-WON/E/EL1£/6] ooozt wer cx0ur pomjdeo ueyng nddry ywourussa0H] 7 paar] ployssnoy ox Jo oqystau ysareou ayy “sueMTUTY 
OOTAURANsHG/a9edsp/N808:U] —ysnug oy} se [JOM se sefey ou JopuN s}Yysy JUR}sUOD! pares asnoy v UT JoyJeTO} paal] EYYO yowa Jo siaquiout oy] ‘Peqyng pue vuvurenjuRs TaAEy 
Tsio TyprueApra‘aoedsp//:dyy 0} onp paypurmp uonendod eaepoy ‘AjpeatIO sp] “ePYO 9} saypeo st AuMUTU0d BAepoy ay} Jo snajonu sy “nypjodyiey BARpOy 
pourojiod 
st yenqu runddap oyey ‘Aep snorordsneun uo 
peop J] ‘Jods oy ye poring are payoayjoo are ra 
soyse ‘Aep yyyly oq) UG ‘arAd J0J pasn jou 
Qyny BOTUIOA-XNN SOUYIAIS JO POOAA “yeUTDI_, PARTIC 
oqo Ssunysy yooo i! 
Lepueyg ‘TueEyeyeg ‘ey ‘erosueg pure Suroes oyeygng ‘Tequooy (diysiom 
Arosjnduros jou si pram) parses v Suis wreyeden ‘apessoy ‘Anos, ayI] soues JOOPING Jensoour) nBapy sArasqQ ‘ayeUIEID epequiey sjung 
saoUa.1ajayy Aa04St] 10ysaauy S19} IVILYD 13T1O sommeuing suoda Ay qeangind [engi yeaq s[eAysay uoyendog on’s 


“1007 “WIE MON ‘SooLAI9g eUONPONPA URISY ‘EIPV" SUMIO A “UEyoRTUEY y “uOIsMUL, pA “eIPUT WoIANOg Jo saquH puE sayseD :290019}0y 


nsnpeaL WU Wea] el 
SOR n3njay, nSnjaz.| S2N0qeT yeimyjnouse 
IAop 1SueyeYy Pur SIOxIOM Joye] eAIPPY ‘Suey eSIpe| €1 
13ey JO sjoypru nsnjaL nsnjay SOAvIS Sip] vAareg ‘eAving ‘eAaTOH 
Ssopos a8eiIIA se ouoys diysio JoBuyy pur so11 ‘mogey amnyNsUsy, sreyey ‘eypuy IPy, Rew ZI 
vonpued “Oyo 
BuUEyesueyH 18eI Jorprur “reMol] HeArpoy 
‘qeinde s de: s 2 LIBAR, 
; qeinde st poo acts ou cox nanyoy, nanjoz SM pure LiearpoH 
ayeyp‘esnpueg ‘Jooq ON ‘syeurrue qseq ‘weujedeyyest A, 
eurueyesuey| yeourpiyed pue eurUeyesuey PIEM Joyo pue Ayuodieo ‘urereseueIZIA, 
‘qeindenieyd jeyorenjeg pue vyjeaog Lryer ‘eyeaopnyues ‘eyyeAag tueysin] yeour ‘ysy yeo Ao], pue Ayyrus yorsq “‘We[NyeyLg} eieuuey epuoy] [] 
SoTUOUIAII. remof st pooy afdejs 
Suneo| [eourpeyed pue “yrod Io Ja0q ou ynq SOK nsnyoL nsnpo. 
-uress 4ST yeyorenneg ‘sue1ie}930A UOU uoMRAnNoyyrYys Ippoy BAvpueg LeAepoy yseq Appoy epuoy| OT 
yeauryiged pue nsnpal, nsnpoL 
Teqoreyed unop sueytejo39 A SISOLI PLJsIp UBARPOH| —SUTUIeIg PIPIEA| 6 
nsnjaL nsnjaL 
yeouryijed pue 
yeyorenneg L0H SUBTIE}OSS A, uoneysunupy PLSIp UBARpoH} surumyerg ISOAIN| g 
rourjuyed pur SOQ, nsnyoL, nsnjaL 
yeyorenneg ~n0yH SUBTIE}OSS A, S|SOLIg POLSIP UBARpoH} surumyerg eprariq}] / 
URLI}OBOA sox nsnjay nsnjaL yedeppng 
SovARUYsIE A, uou pure sory JOLUR "WOOT YION OLISIP LeARpoH nfey} 9 
THOUNMULCARTY “PUNE yer Tempered THEUTEATEADY 
equiewejog euEyRAYINYY ‘NiouweyoOyY “euruejog epswurewesy pue wreyediurg 
‘Isepeyq nuearumury eynuuey IKAoA pur rAopeseyg sox n3npoy nsnpoy ‘wreyedeSezi A, 
‘ligyerearg| [eourpiyed pue} ‘naearwury ipuosolg ‘euTWETeYyNN “euTUEYy}eIseyg ‘weujedeppog 
eye eyorenjeg} ‘“eurueuoro x “eumurefey ‘euruepeures ‘nqeg TeSuag Surysty ‘urefuen weer] 
nsnjay nsnjaL 
‘ueDyuRS sueLejason 
pIByRI, BAIS] ON “Bey ‘ory amnynousy Appoy ndey] 
PUILRT}OARpeg PUR I][eY} B[PURHIOA sok ngnjay. nanyay. Indeyueuy 
‘eUIWURLIO[Og SepOD oy} ‘“eUUeNOARpeg pue eUUEyUy ‘pequurezin 
“Wueryues| [eourpeyed pue ‘euueSuey ‘Tuesreuueyy ‘Ipednerq souedAar SUBTIE}OSS A, sooyeyyng ayeorsawog yndajsuryp ‘LRARpoy 
PIEYLI [eyorenjeg} sedured ‘sayaeuysiea sioyjO pue soyeareys are oWOg UON ‘Ory “orMy[NoLIs VI SBUIeIO A, oory YON ‘mun ‘euysLry euuey] ¢ 
Toyyng payuep puel Ieuoy ‘sueATepy [rue |, 
euUUeyUy “eUYysiy 30 UON SOK nBnpo. nanpo. yyrn qjes ‘daoys weay} = ‘nyjzeuoy Jo njnuevoy RIOD) c 
NyeyueIag) sIO}ssouR ayBUIOF UDO 3a, UO: so n3nyjo, nsnyjo, Surddey Appo nPleN selqjeg mo 
(nyeyuetag) 104s yeuray nN A UON A 1°L 1°.L Idde} AppoL ‘sueavy ‘sueSnpe j| {eq mes} 1 
S[BANSa, aouRyLIIyu saya 100. uraysfs dy.19: asensue’ uonednss, saweu wAuoUd, SUOT}BIOT 19) usepetd uorejndo, ON’ 
TEARS yoy uy BPC pooy uelp Jas eT ne Oo Ss 17890] IYO erypuy Uy UOReI07 yepndod N'S 


sjivjop d1yde.asouy)y ysopeig viypuy :p]t xipueddy 


MPa tl 
eaIpe €l 
ainjsod Surpts & ul paling ee ai 
PRUE 
uoyeuaig| Auourars9 aansuo], Pomorry, pomorry epuoy Il 
uonemarD| Auowissoo Surumeny pemoyly, pomorly sox| Appoy epuoy Or 
uoreonpa 
oqenjur “Surpoay surmyerg 
uonewasg yeoroo “Surueny sox PIPE, 6 
“solloyepngy pur ssury JO syno0o 
ou} UT siaysturur arom Aoyy, ‘pooyjsarid oy} AT[et9edsa ‘uoTRO0A uoreonpa 
snordijor dn aae8 pur sarranoe Arey Surpnypour suoeoo0A oyenjur “Surpoay suruyerg 
Ieqnoas snouea dn yoo) oyM surmyeig asoy} are suruyerg ISOAIN uoneulaI9 [earoo ‘Surueny sox ISOAIN| 8 
poyessIur suTMTYeIg aSay} JO SUIOS ‘Jo}e'T ‘odenSury [rurey, po}dope 
pue npeupMey ul HOAey JOAN JO syuRq oy} UO pay}es pue uojeonpa 
seAypulA JO yINos spreaMoy (Z)eNYseineg WO (CW OOPT-00EI~) oqerquT “Burpooy surumyerg 
oSe sreak 9Qg Ajoyeurrxordde payessrur surmmyerg jo dnois vy uonewaIg yeoroo “Surwmeny sox eplariq L 
oypero ‘Auowrare9 “poSueyoxa 
Surureu ‘Auowia199 are sjoe.ju09 peor paroes 
uonewasg, uoneoryiing uOnU A eam Ay, nfey 9 
JoyjoIq 
JoSunoA| 
Auoutaias ainsuo} poseasap 
uoneulaIg} ‘Auouroseo Surmeyy 0} pamoyTV sox eyes] S 
‘2081 BBURH JOY}O dy} pur oqLy BAPE x 
0} BuroR] 9oRI JeUNT O} SUTTJOWOS puke SoqLy Wey 0} sooR.T) YOTYM 
Q0BI IBJOS SBM SUT] JOI[IeS SY], ‘SorseUAP JOUTSIP OA\} 0} SUOTDq 
Aay} pur sdury gz Aq payni sem aporuosyo nsuoy ay ‘or0fuey Jo uoAId 
Bury efoyD Aq posonbuoo sea yt uayA © Cy Ainjuas yO [un ero st aoyjtioes 
URIJSLIYD OY} WO Poystxd Avy O} SUNTe]D WOPSuUry NsuCy oy], yeurrue 
“se[ey[og pue searyeg ‘seAynyeyg Aq umop ynd sem Jomod aroyy, Auoura1a9 ‘podrysiom 
“elo URTSIYD Apres oy] UI OqLy UeIpIAtIg [nyomod arom sndey uoyeuaIg} Surueu pue japeip [Opt eBuey ndey + 
a sata de Sadie Aaa eee 
sen yea ‘punoss-Suruing 
(AOd 81) eBung enturedysng Jo uornoasiad oy] Woy] — spreMO} pow st UONSaIp 
advoso 0} ey[ap JOALI BUYSLTY 0} poyessrur sureyd onaduey woy] pray ‘a[duio) s,uerpueyouy pouqryorg 
STULMY ISIYppng “e “UepD JoLeM UeATY JUOTOUR Ue ‘efoyquIey| SoyeaI JI Se SIOIq B UO pared 
WO PIALog “UISIIO ISIYPpNg :UISLIO Joy] UO soLIOay}) UayM asnoy ay} JO UOTaIpP Auoura1a9 
are aIOYL, “JOLMISIP weseyerg JO vore [oSueA ‘VyYsereyLy| 9Y} UI st peoy oy] ‘UONeMEID] Surweu pure jopesp PUR € 
AuresA]o poyqryord RIJOD z 
euruey/ndey Jo JooYsyjo engi ong 
‘igs fempereyg JO s}juapusosap oy} 9q 0} WNTe[D IeBeURARlI A YUM ysorid pouqryord seltjeg mos I 
Jo asoy} Seinpeyy Jo seftjeg 10 syeAeNy JO sjuspudosaq uraryerg 
£10)STH 10)s990y [eng yeaq spengry Wg ysorad sO uysnod jayeaed wsno9) asypreua1 Pd10AI see uonendog ON'S 
: . ° sso.19 MOpIM. : Ayaaqng . 


Appendix le: Ethnographic notes on Nattukottai Chettiar 


Nattukottai Chettiar are the notable business community of Tamil Nadu. They 
have a known history of migration settling in Pandiya country ~800 ybp in a defined 
500sqkm, 50-100 km east of Madurai (Chettiar, 1874). As on date they practice caste 
endogamy and clan (patrilineal) exogamy, each clan affiliated to a defined temple 
donated by Pandiya kings. They are staunch Saivites, patronizing Sanskrit and Veda 
Padasala (Vedic schools). Many ‘Nayanmar’ poets and scholars of Bhakthi movement 
were known in them and the main characters in ‘Silapadikaram’ (~2kya) the iconic 
epic of Sangam period, centred around them: both Jain and Buddhist philosophies 
were celebrated in this. Nattukottai Chettiars have been an enterprising, sea faring 
merchant community to the Far-East since the Chola period. They have built palatial 
mansion, unique summer palaces of Indo-European architecture, with central open air 
quadrangle and water harvesting technologies, and live with pomp, great philanthropy 
and hospitality even today. They practice Vedic rituals in marriages, cremate their 
dead, employ Brahmin priests and adopt 16 day pollution and purity period. They also 
have imbibed many local ‘Dravidian’ cultural elements such as uncle betrothing the 
marriage rather than the father giving the daughter as a gift in marriage 


(‘kannigathan’, as practised in Brahmins) and celebrate ‘puberty’ of girls. 


Culturally, the population is divided into 9 patrilineal clans with their own 
temples and three of them, further subtypes — again patrilineal clans for all purposes 
of marriage and inheritance thus totalling to 28 clans. The names of these kovils 
designate the God and Goddeses they worship in the place of their original 
settlements. In 2001, a total number of 30,941 families live in their 76 settlements, 
villages, townships as on date. Forefathers of this caste were granted this land by 


Athiveerapandiya. 


Appendix — 2 Informed Consent Form 


‘GENOGRAPHIC - INDIA’ 


Madurai Kamaraj University - School of Biological Sciences 
In collaboration with NGS-IBM-The Waitt Family Foundation 


Volunteer Informed Consent form for obtaining Human tissues for Genetics Research 


l. 


The purpose of the Genographic study, carried out by Madurai Kamaraj 
University was explained to me in my local language; 


I accept that that I may not obtain any direct benefit out of the said research 
and the results will be available in public domain; 


I agree that my results will be kept confidential and identity anonymous; 


I have the liberty to opt out and request you to withdraw the results and 
samples from the study at anytime; 


I understand that the blood / mouth wash / cheek swab collected from 
myself will be used for DNA based tests that may facilitate better 
understanding of the human genome, migration, evolution and related 
genetic aspects; 


I accept that any results arising out of the research shall be published by 
the investigators for a better understanding of any genetic phenomena; 


I volunteer to donate my blood / mouth wash / cheek swab / saliva for the 
said study: I am aware of the minor discomfort of veni-puncture and I may 
or may not be compensated for the lost wages, discomfort etc., 


I agree that a portion of the samples may be stored in a repository and used 
at a later date for similar non-profit genetic tests; 


I freely and voluntarily chose to participate in the study based on the 
explanation provided by the interpreter / community leader, and hereby 
give my informed consent. 


Wage Compensation for lost working hours / travel expenses / subsistence — 
received / not received. 


Name of the Volunteer: 
Address: 


Signature of Volunteer 


Name of Interpreter/Community Leader 


Signature of Interpreter/ 
Comm. Leader 


Name of Sampling Team Leader: 


Signature of Team Leader 


Date: 
(2008/01/3000 copies/SMS) 


Appendix 3: Details of Expeditions undertaken, Advisors and samples collected during the study period 


Advisors and |Expeditio| },,, | District of Populations N_ |Sampling [DNA 
Colloborators |n collection sampled Collecte |Team extractio 


NRY HG |NRY STR 
Genotypin |Genotypin 


1/6/08, |Tirchy, Nattukottai VIK*,AS*/ 
17/7/09, | Coimbatore Chettiar 171 venvent AS/VSA/ 
Hse GAK 
Vizag Jalari | 98 
100 
s |é 
2 |8 
| & | 3 2) 42| 34 
se |e 10 é 
ae Z| &  [BastGodavari_|Konda Reddy | _50_| 
[Bast Godavari_[Konda Kammara [50 
co [Kodagu [Kodava_ | 99 
ee henat x 
4 g 2 
q 5 S < 2 
J S gv 
S 
al 
3 g 
3 S 
= 8 4 
3 £ 3 2 2 
€ eS 2 
S = 
s 
S : 
Chitpavan 
Pune Brahmin 
) 
Gambhir, Dr P = 
Uma N Rao, Dr Z = eB 
e 1 S S| 4 z 
if + < < < 
g | Z 
|e 
3 
W. 
Siddi 
aldhari 
= 
“ = an 
é 5 < 
iS a < < Z 
oO 5 S 
S 
a 
pase eee EY es | 
*RMP Prof RM Pitchappan Dr Veeraju Andhra University, Vizsg 
*VJK DRV J Kavitha Dr Velaga lakshmi Dept of Genetics, Andhra University, Vizag 
* AS Adhikarla Syama Dr Mahan Kali Dept of Genetics, Andhra University, Vizag 
*VSA Vasanthakumari Varadarajan A Dr Sambasiva Rao Dept of Anthropogy, Andhra University, vizag 


*GAK Ganeshprasad Arun Kumar Mr Vasudevan Bangalore 


Air marshal Chengappa Bangalore 


Dr Gangadhar Dept of Anthroplogy, Manasa Gangotri, Mysore 
Dr Bhat Dept of Anthropology, Manasa Gangotri, Mysore 
Mr Arun Kumar Suvarna Mangalore 

Dr Agarkar Homibaba Cenre for Science Education 

Dr V Gambhir Director, Sholapur Science Centre, Sholapur 

Dr Uma N Rao Director, IWSA, Mumbai 

Dr Kamaloorkh Marolia Assistant Proffesor, K J Somaiyya College, Mumbai 
Dr Vidyanand Khandagale Assistant Proffesor, Shivaji University, Kolhapur 
Dr Mumtaz baig Reader, Amaravati University, Amaravati 

Mr Ashok Vyas and Mr Ashok Mel Agakhan NGO 

Dr Krishna University of Baroda 


YHG L-27/76 and YHG J -304 subtyping for all the samples collected by Genographic India were performed by AS (self) 
Tamil Nadu and Kerala samples studied by VJK was used for comparitive study with my study populations 
Orissa samples studied by VJK and GAK were used for compatitive study 


Appendix — 4 


GPID a "Date of Sampling 
GENOGRAPHIC - INDIA 
Madurai Kamaraj Univesity 


Volunteer Enrolment Form 


Address (Permanent) 
Door No 


Location / Sampled place. _ 

Gender (Sex) 

Native Language 

Ethnicity (Caste) : Subcaste / Gotram : 
Place of Birth : 
Age/DOB _ 


Ethnicity 
Place of Birth 


Father : Native Language 
13 Ethnicity 

Place of Birth: 
Maternal GM Ethnicity : : : 
Place of Birth 


Maternal GF Ethnicity 
18 Place of Birth 


19 Paternal GM Ethnicity 
Place of Birth 


Place of Birth 


If married / wife/husband belong to the same village / nearby Km 


Whether your Parents. related before marriage? Y)es/N)o 
If yes, U)ncle - Niece / F)irst Cousin / D)istantly related 


Sib - ship size : 1) 2) 3) 4) 5) 6) 7) 


Any other observation 


Sample: Mouth Wash ~ ; 7 ; ; 


Sampled By Date 


‘2008/01/3000 copies/SMS) 


APPENDIX - 5 


GPID Date of Sampling 
GENOGRAPHIC INDIA 
Volunteer Enrolment Form - "Nattukottai Nagarathar" 


Address (Permanent in Chettinad) 


treet: 
Town/Village 


Pincode 
State 
Others 


4 
5 


Gender (Sex): Male 
Native language Tamil 
Ethnicity (Caste) Nattukottai Chettiyar 
Subcaste/Gotram/Kovil 
Place of Birth 
Age/DOB 
Mother: Native language : Tamil 
Kovil 
Place of Birth 
Father: Native language:Tamil 
Kovil 
Place of Birth 
Maternal GM Kovil 
Place of Birth 
Maternal GF Kovil 
Place of Birth 
Paternal GM Kovil 
Place of Birth 
Paternal GF Kovil 
Place of Birth 


7 


9 


1 


| 4 |Gender(Sex): Male 
| 5 [Native language Tamil 
| 6 [Ethnicity (Caste) Nattukottai Chettiar 
| |Subcaste/Gotram/Kovil 
| 7 |PlaceofBirth 
| 8 |AgeDOB 
| 9 Mother: Nativelanguage: Tamil 
p10 | Kovih 
Pui] Placeof Birth 
| 12 |Father: Nativelanguage:Tamil 
Pas | ov 
p14 | Placeof Birth 
| 15 [MaternalGMKovil 
| 16 | Placeof Birth, 
| 17 [MaternalGFKovil 
pis | Placeof Birth 
| 19 [PaternalGMKovil 
| 20 | Placeof Birth, 
| 21 |PaternalGFKovil 
[22 | Placeof Birth 


Whether your parents related before marriage? Y)es/ N)o 


If Yes, U)ncle- Niece/ F)irst Cousin / D)istantly related 


Your Sib-ship size: 1) 2) 3) 4) 5) 6) 7) 


Any other observation 


Sample: Mouth wash 
Sampled By Date 


Appendix 6 
‘“GENOGRAPHIC-INDIA’ 


Madurai Kamaraj University — School of Biological Sciences 


In collaboration with NGS-IBM-The Waitt Family Foundation 


SAMPLING TEAM 
Name of the Population: 
Name of the City/Village: 
Taluk: District: 
State: PIN 


3. Group Leader/ Village Head: (Team leader / next in command should explain the purpose, study design and about 


legacy project and obtain consent from the individual/ Clan/ Group/ HouseHold/ settlement/ village Leader or asap: Sampling to be done 


only on voluntary basis: no allurement. Compensation for loss of wages and transportation may be paid) 


Name: 
Address: 
City/Village: Taluk: 
District: INDIA / 
PIN: Tel: 
Signature: 
4. Field Work Support By: Name of NGO / personnel / Local Scientist 
Name 
Address 
PIN: Tel.No: 
Signature: 
5. Sampling Team: Team Leader: 

Doctor(s): 

Technician(s): 

Student(s): 

Field Work Asst(s): 

Other(s): 

No of Samples Collected: 

Notes/Comments: 
Date: Station:_ Signature - Team Leader 


Appendix 7 — Village Document 


Date of Sampling: 


‘“GENOGRAPHIC-INDIA’ 


Madurai Kamaraj University — School of Biological Sciences 


In collaboration with NGS-IBM-The Waitt Family Foundation 


SAMPLING - VILLAGE DOCUMENT 


1. Name of the Population(s): 
2. Name of the Settlement: 3. Village: 
4. Taluk: 5.Dist: 6. State: 
7. Co-ordinates: Longitude: 8. Lattitude: 
9. Altitude: 10. Vegetation: 
11. Climatic Conditions: 
Period(months) Temp Max/Min Rel. Humidity Rain fall 
At the time of 12) ; 13/14) Celcius; 15) %; 16) cm 
Sampling: 
Summer: 
Winter: 
17. Number of the Household: 18. Population size: Total: 
19.Male: 20. Female: 21. Children 


22. Subsistance mode: Foraging / Agriculture / Labour / 


23. Village Economy: Foraging / Plantation Labour / Coolie / Agriculture / 

24. Nature of Housing: 25. Public Toilets:No 26. Wells:No 

27. Terrain: Plain/ Hill/ Mountain/ Slope/ Low lying/ Sea coast/ Shrub jungle/Desert/ Forest/ 
Others 

28. Provisions: Govt. Drinking Water : overhead tank / well / stream 


29. Balwadi: Y/N 30. Nearest Hospital: km 

31. Transportation: Trek / Road / Bus / Other 32. Bus frequency trips / day 
33. Electricity: Village Yes/No 34. House-supply: Yes/No 

35. Fuel : Fire wood / Kerosene / Gas 

36. Nearest shopping «km 37. Township km _=- 338. Shandy sd km 


39. Place name: 40. 41. 


42. Society: M)atriarchal / P)atriarchal 


43. No & Name of Tribes/ Castes living in the village: 
1) 2) 3) 4) 


Clans: 


44. Language spoken: 45. Written script: 


No of Educated persons in the village: 
46. Graduate 47. School final 48. Elementary 
Professional: 49. Doctor 50. Engineer 51. Agriculture 
52. Computer IT 53. Management: 54. Post Graduate 


Any Comments: 


Oral History of Origin and Migration History: 


Cultural details: Festivals celebrated: 


Chief Deities: 


Marriage rituals: 


Birth rituals: 


Puberty Rituals: 


Death Rituals: Burial / Burning / Burial site 


Chief Diet: 


Artifacts used: 


Musical Instruments used: 


Village Document filled By: Signature: 


Appendix 8 : Compositions of Solutions used for DNA extraction 


White Cell Lysis buffer (WCLB) 


Stock Final 
Batch N R t 
comms |e] So [Ea | 


usp_*[103602 sos] 0% [02 fe | 
fw [»[ o [» 
a 


Tris- EDTA Buffer 


Stock Final 
C Batch N R t 
sick eagenS | Cone a Conc a 


Make up the solution to 50 ml using autoclaved Milli-Q. 


Appendix 9: Haplogroup assignment pattern based on ancestral/derived state of YSNP markers 


S 
= 


1072 


87P/LTVN 


oom P| 


bean se) slates st pal alse] ale a oles pel psa ol sla ele ea Tel as | ef ep eel sy al eel a 


= a 
2 IsiSlFiSJElslelslZlalelsleolSlolalalSisialelS|Slels|slelalals 
> _|SISISISISISISIZISISIZISIS/S ISISISIS|SISISISISISISISISISISIS 


Appendix 10: List of Y STRs detected using Y filer Kit 


Positive 
Locus Allele Range | Dye labelled Repeat Motiff Control 


Al 
DYS456 13-18 TCTCT | 15 | 
DYS3891 10-15 TCTG)3(TCTA)n| 13 | 
FAM [Blue] 

DYS390 18-27 TCTA)2 (TCTG)n 
DDYS389II 24-34 
= 7: | 

2 


7-25 
8-16 
7-13 
8-15 [NED [Yellow] 
20-26 
7-18 
Y GATA H4 8-13 
13-17 
23. | 
17-24 


List of Y STRs and indels detected in Multiplex 2 assay 


Type of 
Locus polymorphis | Dye labelled 
m 


DYS426 
DYS388 
M139 5G to 4G 


AGAT 


TCTG 


Q 
| 

4 

> 


GAAA 
DYS458 14-20 GAAA 
DYS19 10-19 VIC [Green] |TAGA | 15 
GAAA 
‘AT 


T 
TAGA 


| 13 
i 
| 12 
TATA | 24 | 
| 13 
| 13 
| 15 | 
| 12 


lele 
15 
13 
4 
9 
17 
15 
13 
11 
12 
4 
13 
13 
15 
TTTTC 12 
19 


> mel 
a > 
> 
=] 


(oe) Iz] (2) 9°ET (8°) ST (T's) $6 (7D OLE (L°2) LOL (6) 16 Besa 
Goro COLT ©) TI Ts) ¢81 OES SIS PATEL] pe 
(Fp) ELT (€0) 76 (O16 LpooquieN} 3 

Jepeuuey 5 
(19) $97 (€€) oT Sb) FIT (6°) 16 Ten 
(eo rot] OO TI (9"r) TOL (LO StS TL (9'r) OST 9) 6°91 (Cb) IZ (78) 6°LZ (re) L’st}'p) Srt9) ooz(S) Tet h) Ezz] Ch) 07 pajood Ysapead BAypUy 
(case | s) eT () 8b1 (LOD 7 @DLY (¥'0) L'0 PUe 
(SH) TET (os) C11 Oru V PAEpeA 
gose | @DSOI (26 (36) €07 ndey 
(PE) TTI Oss ANV Uluyeig] 5, 
G@osi | rr (6L) 8°S7 (Ep) LOL ($9) 6ST (yO 16 (eg) EST wlreqnies = 
apsz | Dre TLL O1D9E eprawig uwuyeig| 
ODLL (E72) oS (OSE @sr (9p) ELT nfey y 
($2) 6 (TO €9 (TL) re7 (7S) 9°91 BIBUILIE YY BPUOy] 2 
(Z€) 8°01 (2) 9FI (€'€) 891 Appoy wpuoy] = 
(LO) € (80) TZ (Zo) s°0 (SO) I (Ds 0 ueyer 
(6S) TEL (by) VEL BBIpEN, 
js) ve | (2) 8'b1 60 €8 [ (Te) LOL mea 
(peer | (Cr) 8ST (p) 9°61 @DL9 (€9) 6 €7 LL) 6'0€ Pe 
(sp) er | (89ST [L) 9°7H'S) TST (F7) 8°6 ¥°0) OEL (o'r) OTE F) FLT Sb) LET (8b) S81 (60) TTL (780) ¥'8e'b) SST b) SLUE'O) CEES) LIZ pafood eyejeusey 
OO | Cr) sor (TO) EL 9) 917 eqniny 
(L'€) 6°6 BBRIOY 
(“ps or | (OTs (L'2) 66 9) $6 Paria 

(Sas (TU (DOL (ZCI) TET (Ds ODEr eypAaey UILIYyeg 

(ve) roe} (69) SLI (s'b) ¥'SZ (6's) 9°07 (GL) €°7#'8) STE PABIOX a 

(DE (16) 6ST (by) V1 eqningnuar) 3 

(Sa) 76 (€) L'O1 (lb) ppt Os ofrD IE eqereurespy | & 

(g'6)oe | W9)TEL @ss (Be) STI ('s) rez =pMon) FF 
COD rLt | O)TIc OPER (LZ) $8 (LO 8 95) 1:07 (€'€) STI BIQOARHO 
asst] (esl fore (Vp) £17 (C6) FET “Des syung 


BAEPOY 


Brera 


PEeEO 


qe | rece 


(CE) VEL (S72) 18 (QL) T6I ($7) 9°6 Wpemserespnoy uUEyeIg, 
(FZ) S01 [ “OSL () 86 TesuaAy 
Gogo EL | (LOD TLT “LO Lu (9°72) SOT (POLL (L's) €'81 (ZS) $7 (8°b) 797 (€"€) €1 (PD 78h (p'b) T0Z pajood e.nysereyeyA, 
(0) $6 TIO 
('€) 01 (+) TEL () 18 Te |Oy 
(36) 76 (S$) SFI (US) rh (rE) LOL SIT THEAY 
ODLE (60) Le (s°9) 797 (8%) VOI Hexeyy 
(Q) 911 puop ey] = 
(b) SPI (6L W pucp| = 
(90) ST (8°€) 601 Bae] = 
Tae_Oy| = 
dort | COET Te) TOL eRe] © 
(CH) ror aesueyq 
Dre | @PrZI (9°) 76 WRABAIIY UTUNyeIg, 
(ys)99T | (WE) TET PS) ost (DPE (“0 spl) v'0E aesied 
(PID €LZ eYyseYysog UTUTyeIg 
(r9) For | OD OIL (72) 8°8 (V) OFT (S's) 691 (98) FLT (Z"8) TSZ (€7) €:0T [(9) E17 (Lb) TSZ pajoog yeavlns 
(CD SL qndiey 
(CO L8 TAYE) 
(L°0) 8°S unduog urnqerg 
(Des TINY UTenErg, 
(8D 71 ulep 
a : Qa 
(Ve) TEL GQdTL red] = 
(+1 Heap 5 
(DTE Tosa] 
BMIEY 
(Z's) EST TOE9 BIEMIOy 
(CS) Ter (2) 6 (r'2) 68 (ors) STZ (©) O11 vIpOyey 
apzez [| (Dro (Te) $6 (L's) 677 OTL BABS 


uoyendog 


<<< OH AW 


suonerndod Apnjs snoriea Suoule sayeulysa a3v pasegq (GSYV) SeudIIIJIG WIeNbs ase19AV :[] xIpusddy 


“eAY Ul JOLIG prepueys oyeorpur () S}OYoeIQ OY} UIIIM sonyeA 
vAY Ul pouoruoUr ore Sosy , 
¢< Sodus 10} HH yor orp 10J payeynoyeo sem o3e GSV x 


(Lp) 9z|_ (d) O'STP) 8 ($2) 6 (6°€) €9T (Ee) ST (8°9) TOE] (ID ere (8°2) s‘r1['9) 66ZhD 8'6r]'9) E1E]'S) L’ze]'6) Vee] h) FEL pajood npeN [ue |, 
WOT Cor QO SIT BaVYSeMOS 
(€O 6€1 (S's) L07 vurepeA UlUyeg 
CO LEL wreuRreyseyeig UNUIe Ig 
OL) Toe] (1) or (9) €@ (€-@) Lot's) SEE NI Bavypey 
Goce] (Orit (6) 6FT (98) £6 9) 9°Szf'8) I've aeAyuUeA 
(€°9) SLI TO 99 (DLL (Te) 911 (ep LT (ZS) 77 OU 
FDP epouL 
sules [uel 
(SO LL (Tp) 9'EL aeypey TepeuTeT 
@TU (6S) 12 (19) Lez (L'0) O11 qekered 
(OL) SI (811) 9°07 qeavreg| 
(L'y) 9°7e((b) TL vaqued| 
(Cb) 781 reed] Z 
(re) 78 (€) °8 (€) oP I]'6) erz ueAled & 
(s)rol| (€€) S6l QaeEs (Se) rel (L'€) O11 (L's) £07 (S61) 8'rr]'8) E916) 9°ZE aepen 
TeAmMpNyL 
ONez] Wor) Ise (i's) 1 5) 8°81 (1°) 68 (6b) 797 TRARIE| 
(H78 (SS)LOI joo equity 
($29 Bnag equininy 
(o9) 8 equininy 
(go x0] (TH SET Drs BIOS 
DIL (L°6) £07 (pL) VEZ UeyoreUN ey 
ueIeyIURy 
(SO) TT (60) £7 (yr) 9°9 aepey 
(TOE (C61) 0€ (Cp) EFT en 
(SET) E57 (v2) IT (9) TOT (ZI) Sz (3°27) 96 (6b) SLT BARYZA 
(80 OTT (L°0) #°ST (L) 8°LZ (6°S) SIT (61D F°SZ (Eh) SET Pood ESSIIO S 
+8 (or) £6 qeXepueyy 
(L0) 17 BOL BAO! 
(Z€) FOL 097 ypuoy eisu0yq) © 
(8°€) FET (¥'0) FO emog eauey| 3 
(€) ST eqepep| * 
(Ep) SIL (1's) Vel JaMOT Opuog 
(62) 6'8 (€0) 8's opuog 
(ep) Lt] (2°20) $71 OST (L'€) LOT (FL) O81 (9°€) 8°61 (L°9) SLT (€°€) F°ZIf'S) F'Ez|"9) O'SZ (8°L) 6°07 aod BSSIIO N 
QO S8 (70) 66 expe UIMIBIG 
enpeyf UTP Ig 
(¢°7) 9°01 (90) TT (US) LUT (pe) LOT|9D L’s|'s) 707 worlo] 
(£2) 8°6 (F206 (S) S11 HO PUD] & 
BDTs TOS Teulug| 3. 
LO 811 “D&T (CQ SL (CTD OFZ uesty] © 
(€€) E71 DTE (£0) Pl (L'€) LOL b)9'¢T Tey0eg 
(8) 66 (0) LT (Sd) FOr (9°02) L'6 epuny| 
COs (90 L'6 oH 
au vera | xa [eetd | 10 ad frPevco] seveo | ezo | vet | eed | xza | aM | ace [oreze [rece | eer [x wTH [tein] «tin [| «tH | «H D ad [ 99 wo [ua | a Vv uonendod 


Appendix 12: Population codes 


Region 
Karnataka 
East India 
Europe 
Argentina 

outh Pakistan 
Afghanistan 
Maharastra 
Karnataka 
Rajasthan 
Karnataka 
West India 
Bihar 
Pakistan 
Pakistan 
Rajasthan 
Uttar Pradesh 

pain 
Karnataka 
N.ORISSA 
Tamil Nadu 
Rajasthan 


East Caucasus 
Lebanon 
Lebanon 
Lebanon 

West Caucassus 
Middle East 
Middle East 


Tamil Nadu _[Ezhave—————*d 
Rajasthan 
Kamataka 


Near East 
Tamil Nadu 
Near East 

outh India 

outh India 
N. Africa 
Pakistan 
Assam 
Andhra Pradesh 
N. Kerala 
Andhra Pradesh 
Maharastra 
Gujarat 
Karnataka 
Gujarat 
Maharastra 
Andhra Pradesh 
Karnataka 

outh India 
Karnataka 
N. Africa 
Pakistan 
Andhra Pradesh 
Gujarat 
N. Kerala 
Maharastra 
Rajasthan 
Karnataka 
Middle East 
Middle East 
N. Kerala 
Tamil Nad 
N. Kera 
Tamil Nadu 
Tamil Nadu 
Tamil Nadu 
Tamil Nadu 
Tamil Nadu 
Afghanistan 


[=] 


Trula 
ear East 


Mala 


[ 


ZB BB BRE ee bb bboeaeeshpbongaeeee 
© © 
2lalalels|SlBlEIS|a/Sl2/s/slZle| le] Fe z/S =/ 5) >|8|S 8 
B/S E)s|/S/8)2)& s/2|o 1s] 8 2/o |S S B/S to 
B)EIEI< B/s|s 5 Slopes s|&| 2. Ss 2/8 es 
nln|s 3 o S 
ae" : 
= = 
E.| 
ra) 
+ + 


adarCape 

ir 
NC_EraniKovil 
NC_Illupakudi 
NC_Mathur/Manalur 
NC_PillayararPatti 
NC_Surakudi 
Nurestani 


| 
fab} 


vure 
e 


Middle Fast [Palestinian —*ial 


Pallan————————*ia 
Afghanistan 
Afghanistan Pash 
Tamil Nada [Parayar [Pay 
Tamil Nadu [Printer [PK 
Andhra Pradesh [Raja ———*dR 
Rajasthan 
Andhra rade 


Punjab ikhJatt 
Pakistan 
Pakistan 

pain 


yrian Syrian 


N 


Nn 
4 


e 


nlnlalalalan 
cs B 


Afghanistan 


Afghanistan [Tak ———=* 
Tamil Nadu TamJ 
India 
N- Kerala 
Fast Asi bean ————SSSCSCS~«* 
NAc NAtrice = 
Turkey Turkish ——SS~=* 
Fast Avia Cygur «de 
Afghanistan Uzb 
Tamil Nadu __[Valayar———=*V 
india Vanniyar——SS—=dVa 
TamilNadu_[Vanniya—————=(Van 
Maharastra War 
West Eurasia Weu 
China bo S*i 
Andhra Pradesh Yad 
Tamil Nadu Yad_T 
Kamatake _(Yenva—«SN 


S 
S: 
Be 
S 
[ haart 


APPENDIX - 13 


Jenu Kuruba tribe at Nagarahole forest, Genograbphic sampling team at 
Karnataka Sep 2008 Nagarahole forest, Karnataka Sep 2008 


Yerava tribe from Kutta, Coorg, Mogaveera volunteers at Bolar 
Karnataka Sep 2008 Mangalore, Nov 2008 


er 
= 


Mandyam Iyengar volunter at Bangalore, Warli tribe at Nareshwadi, Maharastra, 
Nov 2008 Nov 2010 


Korku volunteers at Gavilgad, Maharastra Students volunteers at College sampling 
Nov 2010 at Gavilgad, Maharastra Nov 2010 


Kathodia, Gujarat Oct 2010 Kotwalia, Gujarat Oct 2010 


Kutchi Brahmin, Gujarat Nov 2010 Siddi, Gujarat Nov 2010 


4 


Siddi, Gujarat Nov 2010 Siddi, Gujarat Nov 2010 


~¢ 
7 | 


Sompuri Brahmin, Gujarat Nov 2010 Sompuri Brahmin, Gujarat Nov 2010 


OPEN 8 ACCESS Freely available online @. PLOS | ONE 


Population Differentiation of Southern Indian Male 
Lineages Correlates with Agricultural Expansions 
Predating the Caste System 


GaneshPrasad ArunKumar’’'?°, David F. Soria-Hernanz”?”, Valampuri John Kavitha’*’, Varatharajan 
Santhakumari Arun’, Adhikarla Syama', Kumaran Samy Ashokan®, Kavandanpatti 

Thangaraj Gandhirajan®, Koothapuli Vijayakumar?, Muthuswamy Narayanan’, 

Mariakuttikan Jayalakshmi', Janet S. Ziegle®, Ajay K. Royyuru’, Laxmi Parida’, R. Spencer Wells”, 
Colin Renfrew'°, Theodore G. Schurr"', Chris Tyler Smith'?, Daniel E. Platt?, Ramasamy Pitchappan 
The Genographic Consortium’ 


1,13% 
V 


1 The Genographic Laboratory, School of Biological Sciences, Madurai Kamaraj University, Madurai, Tamil Nadu, India, 2 National Geographic Society, Washington, District 
of Columbia, United States of America, 3 Institut de Biologia Evolutiva (CSIC-UPF), Departament de Ciéncies Experimentals i de la Salut, Universitat Pompeu Fabra, 
Barcelona, Spain, 4 Department of Biotechnology, Mother Teresa Women’s University, Kodaikanal, Tamil Nadu, India, 5 Nilgiri Adivasi Welfare Association, Kota Hall Road, 
Kothagiri, Tamil Nadu, India, 6 Government College of Fine Arts, Chennai, Tamil Nadu, India, 7 Department of Zoology, St. Xaviers College, Palayamkottai, Tamil Nadu, 
India, 8 Applied Biosystems, Foster City, California, United States of America, 9 Computational Biology Group, IBM - Thomas J. Watson Research Center, New York, New 
York, United States of America, 10McDonald Institute for Archaeological Research, University of Cambridge, Cambridge, United Kingdom, 11 Department of 
Anthropology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America, 12 The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, 
Hinxton, United Kingdom, 13 Chettinad Academy of Research and Education, Kelampakkam, Chennai, Tamil Nadu, India 


Abstract 


Previous studies that pooled Indian populations from a wide variety of geographical locations, have obtained contradictory 
conclusions about the processes of the establishment of the Varna caste system and its genetic impact on the origins and 
demographic histories of Indian populations. To further investigate these questions we took advantage that both Y 
chromosome and caste designation are paternally inherited, and genotyped 1,680 Y chromosomes representing 12 tribal 
and 19 non-tribal (caste) endogamous populations from the predominantly Dravidian-speaking Tamil Nadu state in the 
southernmost part of India. Tribes and castes were both characterized by an overwhelming proportion of putatively Indian 
autochthonous Y-chromosomal haplogroups (H-M69, F-M89, R1a1-M17, L1-M27, R2-M124, and C5-M356; 81% combined) 
with a shared genetic heritage dating back to the late Pleistocene (10-30 Kya), suggesting that more recent Holocene 
migrations from western Eurasia contributed <20% of the male lineages. We found strong evidence for genetic structure, 
associated primarily with the current mode of subsistence. Coalescence analysis suggested that the social stratification was 
established 4-6 Kya and there was little admixture during the last 3 Kya, implying a minimal genetic impact of the Varna 
(caste) system from the historically-documented Brahmin migrations into the area. In contrast, the overall Y-chromosomal 
patterns, the time depth of population diversifications and the period of differentiation were best explained by the 
emergence of agricultural technology in South Asia. These results highlight the utility of detailed local genetic studies 
within India, without prior assumptions about the importance of Varna rank status for population grouping, to obtain new 
insights into the relative influences of past demographic events for the population structure of the whole of modern India. 


Citation: ArunKumar G, Soria-Hernanz DF, Kavitha VJ, Arun VS, Syama A, et al. (2012) Population Differentiation of Southern Indian Male Lineages Correlates with 
Agricultural Expansions Predating the Caste System. PLoS ONE 7(11): e50269. doi:10.1371/journal.pone.0050269 


Editor: Manfred Kayser, Erasmus University Medical Center, The Netherlands 
Received April 18, 2012; Accepted October 22, 2012; Published November 28, 2012 


Copyright: © 2012 ArunKumar et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits 
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 


Funding: The study is supported by “The Genographic Project” funded by The National Geographic Society, IBM and Waitt Family Foundation. CTS was 
supported by The Wellcome Trust (Grant number 098051). The funders had no role in study design, data collection and analysis, decision to publish, or 
preparation of the manuscript. 


Competing Interests: Janet S. Ziegle is an employee of Applied Biosystems. Ajay K. Royyuru, Laxmi Parida and Daniell E. Platt are employees of IBM. Asif Javed 
and Pandikumar Swamikrishnan, both members of the Genographic Consortium are also employees of IBM. There is no patenting or profit making to be declared. 
This does not alter the authors’ adherence to all the PLOS ONE policies on sharing data and materials. 


* E-mail: pitchappanrm@yahoo.co.uk 
3 These authors contributed equally to this work. 


{| Consortium members are listed in Acknowledgements. 


Introduction Africa [1,2]. Indian populations are broadly classified into two 
categories: ‘tribal’ and ‘non-tribal’ groups [3]. Tribal groups, 
constituting 8% of the Indian population, are characterized by 
traditional modes of subsistence such as hunting and gathering, 


Contemporary Indian populations exhibit a high cultural, 
morphological, and linguistic diversity, as well as some of the 
highest genetic diversities among continental populations after 


PLOS ONE | www.plosone.org 1 November 2012 | Volume 7 | Issue 11 | e50269 


foraging and seasonal agriculture of various kinds [2,3]. In 
contrast, most other Indians fall into non-tribal categories, many of 
them classified as castes under the Hindu Varna (Color caste) 
system which groups caste populations, primarily on occupation, 
into Brahmin (priestly class), Kshatriya (warrior and artisan), 
Vyasa (merchant), Shudra (unskilled labor) and the most recently 
added fifth class, Panchama, the scheduled castes of India [2,3]. 
Generally, both non-tribal and tribal populations employ a 
patrilineal caste endogamy. This practice, together with the 
male-specific genetic transmission of the non-recombining portion 
of the Y-chromosome (NRY), provides a unique opportunity to 
study the impact of historical demographic processes and the social 
structure on the gene pool of India. 

The distribution of deep-rooted Indian-specific Y-chromosomal 
and mitochondrial lineages suggests an initial settlement of 
modern humans in the subcontinent from the early out-of-Africa 
migration [4,5,6,7,8,9]. The greater genetic isolation of many 
tribal groups and their differences in Y-chromosomal haplogroup 
(HG) lineages compared to non-tribal groups, have generally been 
interpreted as evidence of tribes being direct descendants of the 
earliest Indian settlers [2,10,11,12,13]. Moreover, these tribe-caste 
genetic differences have been attributed to the establishment of the 
Hindu Varna system that has been maintained for millennia since 
both Y chromosome and caste designation are paternally 
inherited. However, the origin of caste system in India is still a 
controversial subject [8,14,15,16], and there are two main schools 
of thought about it. First, demic diffusion models propose an 
expansion of Indo-European (IE) speakers 3 Kya (thousand years 
ago) from Central Asia [10,17,18,19,20,21,22]. Alternatively, 
other models propose the origin of caste as the result of cultural 
diffusion and/or autochthonous demographic processes without 
any major genetic influx from outside India [6,7,16,23]. Overall, 
the genetic impact and mode of establishment of the caste system, 
the extent of a common indigenous Pleistocene (10 Kya to 
30 Kya) genetic heritage and the degree of admixture from West 
Eurasian Holocene (10 Kya) migrations and their level of impact 
on the tribal and non-tribal groups from India, remain unresolved 
[5,6,7,10, 16]. 

The lack of consensus among previous studies may reflect 
difficulties associated with the conflicting relationships between 
genetics and the socio-cultural factors used to pool truly 
endogamous groups into broader categories, sometimes grouping 
Indian populations sampled from a wide variety of geographical 
locations together, such as a tribe-caste dichotomy or caste-rank 
hierarchy [2,5,7]. One goal of pooling data from multiple 
populations has been to smooth individual drift effects in an effort 
to reconstruct putative ancestry [10] and thereby potentially infer 
the past demographic processes shaping genetic diversity. How- 
ever, the success of this approach relies on whether the 
classification employed indeed reflects the true historical relation- 
ships among these endogamous groups. Methods seeking to 
identify the best grouping from an exploration of alternative 
possible classifications, based on seeking maximal between- 
population differences and minimal within-population variation 
[24], would be of special relevance for studies on Indian 
populations classified based on Varna status. This is the case 
because several castes have suffered from historically fluid 
definitions of their rank status, and both the origins and the scope 
of the genetic impact of the Varna system on these populations are 
still unclear [8,20,25,26,27,28]. Further, since the implementation 
of the Varna system throughout India was not a uniform process 
[17], broad classifications of multiple Indian samples from all over 
the subcontinent based on Varna status, or tribe-caste dichotomy, 
may not reflect true endogamous populations and could also 


PLOS ONE | www.plosone.org 


Genetic Structure of Southern Indian Populations 


obscure genetic signals and the finer details of Indian demographic 
histories. For this reason, a genetic study using a careful and 
extensive sampling of well-defined non-tribal and tribal endoga- 
mous populations from a restricted area designed to reduce the 
confounding relationships among socio-cultural factors, without 
presuming Varna rank status, to find empirically the best 
approach of population grouping, could be a successful model to 
obtain new insights of past Indian demographic processes. 

Here, we attempted to apply this strategy to unravel the 
population structure and genetic history of the southernmost state 
of India, ‘Tamil Nadu (TN), which is well known for its rigid caste 
system [15], and to relate the resulting genetic data to the 
paleoclimatic, archaeological, and historical evidence from this 
region. The paleoclimatic and archaeological records show post- 
LGM (Last Glacial Maximum) wet period expansions of foragers 
into the region, whose interactions with later aridification-driven 
migrations of agriculturists have been traced 
[29,30,31,32,33,34,35]. Archaeology also reveals the establish- 
ment of metallurgy [36] and river settlements [17], just several 
centuries prior to the creation of the earliest written records of the 
Sangam literature (300 BCE to 300 CE). These historical records 
named several populations including some in the present study 
(e.g., Paliyan, Pulayar, Valayar) reflecting the existence of these 
now endogamous groups at that time [37,38]. More recent reports 
dated to the 6"" century CE, under the reign of the Sarabhapur- 
tyas, [39] illustrate the local implementation of the Varna system 
around | Kya, following the arrival of Brahmins into the region 
[15,17]. The Tamil epics of this period, such as the Purananuru 
anthology and Silapathikaram, describe a society with a well- 
defined occupational class structure based on subsistence practices 
[22]. Earlier genetic studies of TN populations identified clear 
differentiations of endogamous ethnic groups classified into Major 
Population Groups (MPG) based on socio-cultural characteristics 
reflecting subsistence, traditional occupation, and native language 
(mother tongue) [40,41]. Although some studies have identified hill 
tribes as the earliest settlers, and others suggested a common 
genetic signature among distantly ranked-caste populations, the 
main evolutionary and demographic processes shaping the 
observed genetic differences among populations from TN are still 
unresolved in the literature [15,42,43,44]. 

In the present study, we examined the Y-chromosomal lineages 
of 1,680 individuals sampled from 12 tribal and 19 non-tribal well- 
defined endogamous populations. We first investigated whether 
tribal and non-tribal groups shared a common genetic heritage 
and characterized the proportion of putatively autochthonous and 
non-autochthonous Indian Y-chromosomal haplogroups. It is 
important to note that the total sample size used here is higher 
than those in other studies covering the entire Indian subconti- 
nent. Further, the detailed anthropological annotation of endog- 
amous populations sampled from a restricted region within India, 
together with the paleoclimatic, archeological and_ historical 
regional-background were all important aspects needed to reduce 
the confounding relationships among socio-cultural factors. This 
general approach allowed us to infer important genetic signals and 
the finer details of the population demographic _ histories. 
Therefore, we sought to determine which of the classifications 
based either on the Varna system (rank status, tribe-caste 
dichotomy), or social-cultural factors (reflecting subsistence, 
traditional customs and native language), or geography better 
indicated true endogamous groups by exhibiting higher between- 
population differences and lower within-population variation. 
Since both Y chromosome and caste designation are paternally 
inherited, we further explored whether any of these genetic 
differences could be attributed to the historical evidences of the 


November 2012 | Volume 7 | Issue 11 | e50269 


establishment of the Hindu Varna system. In contrast, we found 
the overall Y-chromosomal patterns, the time depth of population 
diversifications and the period of differentiation correlated better 
with archeological evidences and the demographic processes of 
Neolithic agricultural expansions into the region. 


Materials and Methods 


Sampling Strategy 

Tamil Nadu, the land of ‘Tamils (Tamil has the most ancient 
literary tradition of all Dravidian languages), is the southeastern 
most province of India, measuring 130,058 km? with a population 
of 62,405,679 (2001 Indian Census: http/www.censusindia.gov. 
in), the majority living in 17,272 villages. We sampled a total of 
1,680 men, avoiding relatives to the third degree, from 12 tribal 
and 19 non-tribal endogamous populations, which were selected 
for their cultural uniqueness, geographical spread, and ethno- 
graphic features. Samples from tribal participants were collected in 
their isolated native villages and settlements from the tropical 
forests of Western Ghats on the west side of TN. In contrast, non- 
tribal populations exhibit a larger census sizes and geographical 
spread and they were sampled in colleges and community 
gatherings, covering 8% of the total villages from TN (see 
Figure 1 for sampling locations). ‘The institutional Ethical 
Committees of Madurai Kamaraj University and the University 


Figure 1. Tamil Nadu map showing the sampling location of 
the 12 tribal (squares) and 19 non-tribal (circles) populations. 
The majority of tribal populations are located in the mountains of the 
Western Ghats. The color codes are: Red - Hill Tribe Foragers (HTF); 
Turquoise — Hill Tribe Cremating (HTC); Green - Hill Tribe Kannada 
(HTK); Grey - Schedule Castes (SC); Pink - Dry-Land Farmers (DLF); Deep 
Blue — Artisan and Warriors (AW) and Yellow —- Brahmin related (BRH). 
Population abbreviations are as shown in Table 1. 
doi:10.1371/journal.pone.0050269.g001 


PLOS ONE | www.plosone.org 


Genetic Structure of Southern Indian Populations 


of Pennsylvania (USA) approved the protocol and_ ethical 
clearance of the study. The project was explained to the volunteers 
through local contacts or community leaders in their local 
languages and signed informed consent was obtained before 
samples were collected. Permission to utilize pre-existing samples 
from Nilgiri tribes (N=570) was obtained from the relevant 
institution (Nilgiris Adivasi Welfare Association). Further geno- 
typing of 17 Y-STRs and deeper Y-SNPs were performed on 46 
samples of Piramalai Kallar, 40 samples of Sourashtra and 107 
samples of Yadhava used in a previous study [19]. 

While many previous Indian population studies aimed to 
elucidate the main processes involved in the genesis of the social 
stratification by pooling populations into broad classifications such 
as caste-tribe dichotomy and social hierarchy [6,13,45,46], we 
sought to explore whether alternative classifications could better 
reflect the relationships among the true endogamous groups by 
increasing between-population differences and reducing within- 
population variation [24]. We considered a partition of the 31 
endogamous populations into seven Major Population Groups 
(MPG) based on_ socio-cultural factors primarily reflecting 
subsistence, traditional customs and_ native language 
[47,48,49,50], which we contrasted with alternative groupings. 
The defining features for these MPGs were the following: (1) “Hill 
Tribe — Foragers’ (HTF), tribal populations sharing a foraging 
mode of subsistence and speaking their own Dravidian (T'amil/ 
Malayalam) dialects; (2) ‘Hill Tribes — Cremating’ (HTC), tribes 
who cremate their dead, an unique socio-cultural feature among 
these tribal populations; (3) ‘Hill Tribes - Kannada-Speakers’ 
(HTK), hunter-gatherer tribes speaking the Kannada (Dravidian) 
languages; (4) ‘Scheduled Castes’, (SC), designated by the Indian 
Government as non-land owning laborers, ranked lowest in the 
Varna system; (5) ‘Dry Land Farmers’ (DLF), populations living by 
dry-land farming subsistence, cultivating crops (millets and grains) 
that do not require wrigation technology; (6) ‘Artisans and 
Warriors’ (AW), populations that are traditionally warriors or 
artisans of various kinds, and; (7) ‘Brahmin Related’ (BRH), 
following the Vedic traditions with a good knowledge on water 
management and wet land irrigation. The populations included in 
each of the seven MPG and their ethnographic notes are given in 
Table 1. Although it may appear that the proxies used for 
grouping the populations mix criteria in non-uniform and 
arbitrary ways, we followed a systematic, step-by-step approach 
to test and validate these classifications by comparing them with 
other groupings employed in the literature. Endogamous popula- 
tions were initially sampled taking caste-tribe and social hierarchy 
into consideration. After considering their ethnographic histories 
in greater detail, we tested whether tribes with common cultural 
features tended to share a similar genetic makeup, and whether 
population groups differentiated better when clustered according 
to socio-cultural factors reflecting thei mode of subsistence, 
traditional customs, and native language. It is portant to stress 
that many of the criteria used in the classification based on the 
seven MPG are in some degree correlated with previous methods 
employed to classify Indian populations (such as_ tribe-caste 
dichotomy, or caste-rank hierarchy). It could be argued that the 
seven MPG method may not be the best possible arrangement 
from the perspective of explaining the entire cultural variation in 
TN. However it captures the observed pattern of genetic variation 
slightly better than any of the previously attempted models (see 
Results Section). Finally, we recognized that there is always a 
degree of arbitrary in all the methods used to classify endogamous 
populations, but all of them are just subtle variations around the 
same theme: economic or mode of subsistence. 


November 2012 | Volume 7 | Issue 11 | e50269 


Genetic Structure of Southern Indian Populations 


‘paipnys suoijejndod snowebopua jequ}-uou pue jeqi} L€ ay} Jo UONdUDsaq *L aIqeL 


fon) 
‘oO 
S 
sysl@Z'v6p LL 77S9'LL * LLLO'B leanpey yl sysalig/aunynoa6y puey 33M y6IH ysues il jebuah| i) 4 
sy ltL'v6p = LZ L9EV'LL ‘ S7S'8 eajeunaly DH sysalid/aunyna6y puey 39M yBiH ysues il weueseyreyeig DH —_ 
povl'Z8 Ov LOE6'LL * LLL8°6 jeanpeyy LYS sianeayyainynouby pue] 2M —-A|PP!W lnyseanes il eaysesnos Lys sulwyesg-HYg 3 
pooo’ooL == ZL CLLTLL * vLT’B yewinyexuey AIW Hurysi4/Bunea Jeuysl4 mo] [use ud JeAn>piNW AW a 
pooo‘o0e = S6. @Zev Ll‘ vSSL'8 uewnyeuey AZA Budde, Appol/sioem —-a|PPIW yd eneyzy AZ3 _ 
~ 
aunynouby w 
p000‘00L O0L LLEO'6L * GLLVZL ODN WL PUET 1AM/SIEW JO SIBARAM\ ~—-BIPPI [use| ud surer jlwe | WL E 
SIOLIC\A, Ke 
p000'00E + —S6 SEEBL ! S9KL'6 jeanpeyy uA SIBIBYJLO 19]UNH/S1BACBM\ JON mo} yd sefejeA uA guesiuy-MY _ 
sclO'ETh = 08 SLO8'8Z * S9EES peuwey AYW tid AIPPIW ud Jeneley| AYW. = 
qoo0'09t = ES. QOLLLL * €€L9'6 jeanpeyy wd sid AIPPIW yd deyjey lejewiediq wd = 
s2681'E09 86 LEOOLL * LLLL'B uewinyesuey DaN buidde! Appol/41d IPP! ud ade> sepen DaN = 
v 
s2681'€09 6S v@8r'LL : 6S9L'8 yeaaunaly LON Burddel Appol/41d —A|PPIW ud ANL 4epeN LON S 
Zz 
s5lv0'09L 96 9L87'6L * 96SETL ODN NNA did IPPIW ud NIN Jefi1uue, NNA 
slvO'09L =—LZ ZE8'8L * L8L'ZL apolq UNA Jia IPPIW ud defluueA UNA 
siouue4 PUeT 
5lv0'09Z = ZOL QLEL'BZ * SOL8'6 jeanpey AGA siadaay ap3e>/10 PIPPIN use| ud eAeUper AQK Aig-41d 
pseo't = LZ 8L61'°8Z ‘ 7066'8 AnpuayauL Aud uauaysiy |e23seo> Mo} [use ud deneled Add 
seewiee IS 7620'8Z * E8LO'OL aajaunaly uid siainogey aunyjnouby Mo} [ude ud de||ed uid 
SLOU'ZLL'L =v vySTBL * 76E6'6 jeanpeyy Add ssaunogey aunyjnouby mo}? jue yd defesed Aud = 
261S'098'L cs 6ZLL'6Z : V6LY'ZL JOIN Nud sseunoge] aunynsby MO7] jue yd NIN JeAeeg NY¥d = 33seD ajnpayrs-ds 
qvSt'v = 67 Q90L'ZZ + L8OZ'LL SUBIIN aw siabe104 aque epeuuey ud equinunynyiny aw 
86'S = SE VSLO9L * 99LLLL SUIBIIN awn siasayrey Aauoy aque epeuuey ud equininy} am 
Lt'Sp = OF 6VEGOL * HZLO'LL DI ssabeso4 aqui epeuuey ud ueyoieunyey Ly 
epeuuey 
glblive LL BL7ZS'°9L * E799'LL y1d siasayyey Aauo} aque epeuuey ud equininy eneg yd aqui IHH-LH 
eovl'L 79 ELLO'9L * 69VL'LL LOM ABanyjeyaw/uolernsawoq aqui ude yd 210) LOM 
s09S'L 97 6ZO'LL * LZLULL GHL uo!je>1saWI0q aque epol ud epouyl GHL 
Bunewald 
e9el'e LL €0ZELL * 7560°6 yaajaunaly YN uonealin> Bunsys/siabeso4 aqui wejeceyey ud ueseyluey IN aqui IHH-DLH 
2895 87 6£96°9Z * 8087'0L a10}eq WO}, YO siabeso4 aqui [use| ud Jepey yay 
2909'SSL 08 950r' ZL ‘ 8ELO'OL SUIBIIN Tal sia6eso4 aqui [ude ud Bind Tal 
290V'8 = £9 8906°9Z ‘ VLSEOL a10}eq WO}, Ald sia6eso4 aque wejerejew/!wet ud sekeind Ald 2 
ecSO'E G6 CLOT LL * 1L9°6 Judy Nid siasayrey Aauo}q aque yd uehijed Nid v 
siabe104 2 
ale we 950%'ZZ * SSO9'0L SUIBIIN ANd s10JeAIYND/siaHe104 aqui wejerejew)/!we | yd eflued ANd aqui IHH-4LH a 
= 
snsua> ft (@pnzibuo} }1}S1G y®POD B2uUa}sISqns jo apo; puey abenbue7 eanen Ajwe4 oawey uonejndog y®POD dnoip sofe; s 
/apnyyze}) ,sezeUulpsood pajdwes jeis0s onsinbury <_ 
Llu 
Zz 
° 
iva) 
° 
pa} 
a 


Genetic Structure of Southern Indian Populations 


Y-Chromosomal Analysis 


an = DNAs were extracted from blood or mouth-wash samples using 
a Ss standard methods [19]. Samples were genotyped for single 
& 3 nucleotide polymorphisms (SNPs) with a set of 23 custom TaqMan 
assays (Applied Biosystems) using a 7900HT Fast Real-Time PCR 
#19 System. In addition, 19 Y-chromosomal short tandem repeat 
3 (STR) and 6 SNP loci (Y-filer‘™ and Multiplex IT Kits, ABI) were 
3 genotyped using an ABI 3130XL Gene Analyzer, and fragment 
3 sizes were determined using the GeneMapper Analysis Software 
“, © (v3.2, ABI) as described elsewhere [51]. Genotypes were validated 
aC} R by testing reference samples from Coriell and the Genographic 
£3 & Consortium. The multi-copy markers DYS385a and DYS385b 
8 2) 3 were excluded from further analyses because of ambiguity in 
Uz |o distinguishing these loci. Y chromosome haplogroups (HGs) and 
paragroups were determined according to the 2008 YCC 
nomenclature [52]. 
By |3 ae 
oe le Statistical Analysis 
5 a = The software ARLEQUIN 3.11 [53] was employed to compute 


Ner’s D (Nei 1987) and conduct AMOVA [54] using both Y- 
chromosome HG frequencies and haplotype data. Fisher exact 
tests were carried out among populations and MPGs to identify 
significantly over- or under-represented HGs. Among those over- 
represented HGs that tended to characterize any given MPG, 
Fisher exact tests were further performed on the number of 
populations over-represented in the given HG within the MPG 
versus those outside of the MPG to quantify the significance of 
such associations. Principal Component Analysis (PCA) [55] was 
performed using HG frequencies, centered without variance 
normalization [56] and with the significant components identified 
by employing the skree-plot method [57] using R, version 2.9.1 
(http://www.r-project.org/). The same software was implemented 
to perform non-metric multidimensional scaling (MDS) [58] using 
Rs; distances generated from the 17 Y-STR data of the TN 
populations, using ARLEQUIN. The relative HG age estimates 
were based on the variance of 17 STRs of the most frequent HGs 
for the seven MPG as previously described [51]. 

We considered the problem of how to quantify the significance 
of the difference between specific population group structures. 
AMOVA’s resampling scheme compares individual group struc- 
tures to the whole ensemble of randomly varied assignments of 
populations to groups, as well as of samples to populations. This 
tests the hypothesis that a specific group structure represents 
organization of the genetics among populations better than would 
be expected by chance. In our case, we had the different problem 
of testing whether one group structure was significantly better than 
another group structure. In this case, assignments were already 
determined, and likely are both already better than expected by 
chance. The question we tested was whether that variation in data 
randomly drawn from a population could have produced sufficient 
variation in the AMOVA results to account for the differences 
between the specific group assignments being compared by 
chance? Hence we resampled the STR haplotypes with replace- 
ment, modeled by a multinomial distribution, and computed the 
median and 95%CI’s of the results using R, version 2.9.1. We 
tested resampling sizes up to 5,000 times, and found that 500 were 
sufficient to give reasonable accuracy on the median and 
confidence interval estimates. We therefore resampled each 
configuration only 500 times. 

The phylogenetic relationships among Y-STR_ haplotypes 
drawn from individual haplogroups were estimated with the 
reduced-median (RM) network algorithm in the program Network 
4.5.0 [59,60], applying weights inverse to averaged haplotype 
variance and reduced median reduction coefficient set at 1.0. This 


Code" 
M 


Mode of Subsistence 
Wet Land Agriculture/Priests 


Social 
Rank’ 
High 


Native Language 


Linguistic 
Sanskrit! 


Population Name _ Family 
IE 


Vadama 


Code" 
VDM 


"Sanskrit is the language of scriptures and ceremonies, but populations quickly adopted local cultures and languages. 


J_Lower, Middle & Higher social ranks are self-perceived/assigned classifications. 


*. 2001 Census, Government of India, http: www.censusindia.gov.in. 
©1981 Indian Census. 
Approximate coordinates. 


“1931 Indian Census. 
*-Population code used in PCA & MDS plots, 


f All Brahmin-related castes in Tamil Nadu, 
NTN (North Tamil Nadu), TNV (Tirunelveli). 
DR (Dravidian), IE (Indo-European). 


3-No information available. 
doi:10.1371/journal.pone.0050269.t001 


Table 1. Cont. 
Major Group 

4. Estimated census size. 
£-1901 Indian Census. 


PLOS ONE | www.plosone.org 5 November 2012 | Volume 7 | Issue 11 | e50269 


program creates a tree topology based on the interrelationships of 
the emergence and transmission of mutations in the respective 
haplotypes. Even under simplifying conditions, the construction of 
this simple combinatorial structure is algorithmically difficult, and 
diverse algorithms give different answers. This result can be 
informative if some subset of the results is consistent among 
models. ‘Therefore, in addition to using Network for assessing the 
phylogenetic relationships of Y-STR haplotypes, we also used 
ULTRANET _ (http://www.dei-unipd.it/ ~ciompin/main/Sito/ 
Ultranet.html), where the underlying distance (metric) between 
nodes is ultrametric. Since tree structures reflect an ultrametric 
structure, an algorithm that maps the compatibility of associations 
according to such a structure may be uniquely informative. ‘This 
approach, which is orthogonal to other phylogenetic approaches, 
helped confirm the results observed in RM network analysis, 
thereby validating the consistency of the population associations 
with evolutionarily related haplotypes. 

Coalescence methods, as implemented in BATWING [61], 
were applied to several different subsets of populations to quantify 
major underlying demographic events, estimate divergence times 
and assess the phylogenetic relationships among 'I'N populations. 
One of the major characteristics of BAT'WING is that the trees it 
produces are constructed on the assumption of no gene flow 
among demes. The proportions of samples the Metropolis- 
Hastings algorithm provides in each tree gives some sense of the 
strength of that candidate tree in representing the data. These 
estimates account for the impact of mutation histories through the 
likelihood scores obtained over the distributions of priors for 
mutation rates and other demographic parameters. The outcome 
of these estimates is that modal, and near modal, trees will show a 
somewhat filtered view of the genetics contributing to the most 
likely trees observed. Given these considerations, BATTWING is 
expected a priori to be appropriate for testing whether major 
population differentiation occurred before or after the Varna 
system was historically established in TN, under the assumption of 
restricted admixture among populations under this social organi- 
zation and structured endogamous system. The various testing 
procedures described above, including MDS, PCA, the AMOVA 
tests for differentiation, and the Fisher tests, were further applied 
to establish whether there was a signal for common gene pools 
among populations, as required for typical BATWING analyses. 

In addition, BATWING admixture validation tests [62] of the 
TN data were applied under three simulated potential scenarios. 
In the first scenario, an individual population (Pantya) was 
randomly split, and the BATWING analysis of the population 
split time was performed. BATWING generally produced a 
median time of less than 500 years, with the 95% confidence 
intervals (CI) covering only the last two generations. In the second 
scenario, recent gene flow was modeled between two populations 
(Paniya and Brahacharanam) estimated by BATWING to have 
already been isolated for a significant time (19.5 Kya) by randomly 
mixing different proportions of chromosomes from each popula- 
tion. BATWING gave much younger population divergence 
estimates (9.3 Kya) than the unmixed split, even with only 5% of 
the Y-chromosomes mixed randomly between the two popula- 
tions, with a 10% mix between populations decreased the 
divergence time estimates by more than 50% (3 Kya). In the 
third scenario, we explored the impact of BATWING estimates by 
randomly introducing an in-migrating population (Pantya) carry- 
ing new paternal lineages into two differentiated demes (Braha- 
charanam and Kota: split time was estimated at 4.7 Kya). These 
estimates were only slightly affected (the split time actually 
appeared to increase to 6.2 Kya) when the in-migrating propor- 
tion did not exceed more than 40-50%. At that point, the modal 


PLOS ONE | www.plosone.org 


Genetic Structure of Southern Indian Populations 


trees were dominated by the in-migrating population. Overall, the 
results of the BATWING admixture tests based on data from the 
TN populations were similar to those observed in a study of 
religious populations within Lebanon [62]. ‘Therefore, BATWING 
generally seems to show little sensitivity to gene flow from 
immigrants bringing new paternal lineages (different HGs) into the 
parent population, but is very sensitive to gene flow between 
populations sharing paternal lineages from the same HGs. 

Besides assuming no gene flow, BATWING presupposes that 
the population samples are random. As a result, usmg BATWING 
to analyze the histories of individual HGs drawn from populations 
yields dramatically different estimates of coalescence times, times 
of expansion, and other population parameters because, as 
mentioned in the admixture modeling, BATWING is more 
sensitive to admixture than in-migration. ‘Thus, BATWING may 
be applied to individual HGs to extract information about specific 
in-migration events. Further, HGs that tend to correlate strongly 
with overall population estimates are likely to be more represen- 
tative of their common ancestral gene pool. ‘These results may be 
expected in that selection of the modal population trees will tend 
to preserve configurations where the most common of the shared 
lineages comprise the strongest signals contributing to the 
likelihood function. Therefore, selection of modal trees acts as a 
filter that tends to exclude immigrating contributions, although it 
will be heavily influenced by inter-population migration. 

In these BATWING estimates, mutation rate priors were those 
previously proposed [63] based on the effective mutation rates 
previously cited [64]. Between 1.5 and 3.5 million Monte Carlo 
(MC) samples were collected, generally accepting equilibration 
following 500,000 MC samples and being determined by decay to 
equilibrium of global estimates of effective population size and 
relative constancy of quantile measurements extracted from the 
equilibrated regions. ‘Times associated with clusters identified by 
RM networks as indicating evolution within populations were 
estimated using UEPtmin and UEPtmax estimates within BATW- 
ING. When computing population splits, large numbers of 
populations tend to produce cross-talk between bifurcations on 
different branches. A way to resolve this cross-talk is to set up 
multiple runs with the various branches pooled except for the 
primary branch under consideration. This approach also provides 
an opportunity to check the consistency of split times of the parent 
branches common to the pooled topologies. Composite trees may 
then be constructed from the results of the multiple runs. SNPs 
selected as unique evolutionary polymorphisms (UEPs) in compu- 
tations of population split times depended on the representation of 
variation through each of the populations being considered, or 
through the pooled populations for UEP time estimates. 


Results 


NRY landscape of Tamil Nadu reveals predominantly 
autochthonous lineages 

A total of 21 Y chromosome HGs were identified in the study 
populations (Table 2). ‘The overall HG diversity among popula- 
tions was 0.886+0.003; of these, tribal populations exhibited lower 
diversity (0.796+0.013) than —non-tribal_ ~—_— populations 
(0.881+0.004). The majority of this genetic variation (82%) was 
accounted for by seven HGs: H1-M52 (17.4%), F*-M89 (16.3%), 
L1-M27 (14.0%), Rlal-M17 (12.7%), J2-M172 (9.4%), R2-M124 
(8.2%) and H-M69 (4.7%). It should be noted that 90% of the C- 
M130 samples reported here (66 out of 74) were positive for C5- 
M356 while the rest were negative for both C3-M217 and C5- 
M356 (Table S1). 


November 2012 | Volume 7 | Issue 11 | e50269 


Genetic Structure of Southern Indian Populations 


zv8 (ZL0'0) 068'0 6ZSL 0% SOL SOL LLZ 000 LLZ 7@H8 000 000 000 000 SOL 000 ESOL 7@y8 LLZ NZL 000 7E9 6 dekejen 
SAOMeM UeSINY-MY 

ZLOL (£000) 1880 bLEL91L SOL ESZ 610 L60 B50 ZLtZ LI'1 L60 000 000 9EL 6LO EC'EL StS H95 ETI L60 OLE HIF eIOL JIG 
SLEL (LLOO) POG'O OO'SL SZOL OSZ OSZ OOO SZL 000 OD'OL SZE 000 000 000 SZL SZL OOOL OOS SZB SLE 000 000 8 dear 
68L (SS0'0)SPZ0 ZLE 68'L 000 000 000 000 68 LI'Zy 68L 000 000 000 681 000 8691 “ZZE LLE 99S 000 6 £€ de|jey 1ejewuedlg 
816 (SLO0) S680 807 P7ZL ZOL 8L6 ZOL ZOL 000 ZVEZ 000 ZOL 000 000 ZOL 000 wvLZ FLL 816 OLS 80% 80% 86 ade> sepen 
80'S (S200) L980 ZL'OL 829 000 GEE 000 000 000 L882 000 000 000 000 691 000 SZSl 98LL “v8 LY8 000 000 65 ANL 4epeN 
67L (9100) 6880 86 SV'LL 807 000 000 807 807% 96'EZ 000 807 000 000 ELE OOO HSEL ELE ELE EFB POL 67L 9% NIN Jecjuue, 
6@'vL (€v0'0) 9280 6Z'PL 6Z'FL 00:0 000 000 000 000 45°82 000 76 000 000 000 000 9% 000 9% 76 000 000 12 dexiuue) 
Z8'9L (E100) 0980 BZ0L S6'VL £60 000 000 £€60 000 95:02 ZBL 000 000 000 000 000 £961 PLE LL 19S 000 o8z OL eAaeypeAr 
saauuey puey Aig-410 

£881 (7100) 0880 %L6 tHE STE OE 000 097% SZEPIEL 000 590 000 590 590 000 EF1Z STE SS% O6E 000 OE PSL  feiol rs 
vO'LE (ZS0'0) SL8'O LULL OLE OLE 000 000 LZ 000 ZS'8L 000 000 000 000 000 000 Isl 000 000 OLE 000 000 Zz Jeaesed 
ELEL (SLO) LEO 086 ELEL 96L SEL OOO O00 88'S 69'SL 000 961 000 000 961 O00 OZ LL 88S PSL 88S 000 1 IS deyed 
OS7ZL (6200) 0260 €€8 OSZL E€8 000 000 Liv Liv OSZL 000 000 000 lv 000 000 E802 000 €€8 000 000 Liv vz dekesed 
LEZL (LEO) 9E8'0 796 SBE OL 7ZHL OOO ZL ZL 796 000 000 000 000 000 000 ZOrE SBE ZOL SBE 000 692 ZS NIN JeAeded 
33se) ajnpey>s-Ds 

LS (8700) 8bL0 L8Z LE1 L511 6L0 000 000 000 St6 L51 000 000 6£0 000 000 zZ5 7% 60L 000 729E 000 L51 Lel Je}OL MLH 
00'0 (9€0'0) 9420 vZZL 00:0 000 000 000 000 000 vl'vz SrE 000 000 000 000 000 8rvE 000 000 690% 000 000 62 equinunyn|inw 
00'0 (9600) 19S:0 1ZS 98% 000 000 000 000 000 IZS 987 000 000 000 000 000 1/59 987% 000 ELL 000 987 SE equininy 
Sev (bv0'0) L9Z0 Sev 000 SEY LIZ 000 000 000 000 000 000 000 LIZ 000 000 Ely 6EZL 000 PL1Z 000 LIZ ueysieunyey 
00'0 (9LL'0) 0F9'0 88'S 88'S 000 000 000 000 000 S9ZL 000 000 000 000 000 000 9Z1L 00:0 000 zess 000 000 LL equininy elleg 
epeuuey eqiit |IIH-y.LH 

987 (910°) L980 b7'Sl 6THL IVE IBE S60 S60 S60 ILS S60 FEEL 000 000 000 000 IL'5t 56:0 560 Z9Z 000 61 SOL [e101 IIH 
00'0 (9200) SL8'0 SE6L BS7Z HBr HBr 000 000 000 000 191 S¥9 000 000 000 000 S90E 19L 000 908 000 000 7 e10y 
69 (L900) vEs'0 HS'LL 00:0 000 O00 SSE SBE SBE 69Z 000 9V8E 000 000 000 000 SLL 000 000 S8E 000 69% 9% epouL 
88g (8500) SZ8'0 88'S 88S 88S 88'S 000 000 000 ES'EZ 000 000 000 000 000 000 lr6z 000 885 9211 000 000 AL ueiey|Uey 
Bunewas> equi |IIH-DLH 

8'8 (S700) L890 680 8 650 811 O0€0 000 SL £05 0£0 000 000 000 LOZ 000 9L6 179 650 STES 000 888 BEE eI0L AIH 
45°87 (ZE0'0) 6rZ':0 00'0 00°:0 000 00:0 000 000 000 000 000 000 000 000 000 000 rI'z%E 00:0 000 Z58z 000 IZ0L 8z Jepey 
GZL (8200) 6620 OSZ O00 SZL SZL 000 000 000 S791 000 000 000 000 S28 000 OSZ S£8L 000 S?9E 000 S79 O08 e|nd| 
Z8'SL (0900) 0P9'0:00'0:«00'!0:—«o0.00:—sSL'E-—«GSL «O00 ~G6SL 6S'L 000 000 000 000 000 OOO LLLL SE9 COO HI'ZS OOO 651 £9 degeind 
4v6 (6700) 6590 000 9L'E 000 000 000 000 OLE LL'Z 000 000 000 000 000 000 BSllL LIZ LL? 6£5S 000 ESOL 56 uehijed 
00'0 (4900) 8Lv0 6FL BLZ GEL GEL 000 000 000 6EL GEL 000 000 000 000 000 000 000 000 00'S 000 8zsl ZZ eklued 
saebeso4 aqui [}!H-4LH 
(as) Aysaeaiq vZLW-ZH LL LOZWZPZW SVINSZLINZSEW ZZW G6IW-M 89IN ZPWZZLIN POE 3d Z6ELW ZSW 69INLOZW 68W 96W OfLW ON SNOILW1NdOd 

auad 1eN “eld YO -d OTs -eeze -Lezr = -@r | -TH -PLH -LH -H -®D -d)s-a -> 


“npen |iwey, Woy suoie|ndod 1¢ ay} ul (9%) saiuanbay dnosbojdey awosowoiy> A *Z ajqe 


November 2012 | Volume 7 | Issue 11 | e50269 


PLOS ONE | www.plosone.org 


2.50 


14.29 
18.18 
4. 


76 


=~ 
Qa 
@) 
o 
ce 
(ary 
GG lomuNh 
aon © for o iS 
oan — ba ~~ oe 
aon - a> 
—~ 
mn eR SiS 
=- fea +t iS 
el 2 ben Cc Re 
siscssf& 
o 8 
eis ages 
N}]m4 © 2 % 
&ijoaoos 
1 
ba) 
fH. 
a 
efx un QS 
pa S 
-ea B. 
eSl/2eacan 
N 
Tlook w 
mom o foe — By 
COZ\isciao -|S 
a) ra) 
o felo 
men) © feu c Re 
one! — fou o iS 
n 
al 
556/888 8 
OZiscoaoc 8S 
a 
hal 
a2@|/S Sse 
Sael/iaocds 
Les ia) 
+-9/S§ 988 
S| bee So ie 
a fo) a) 
=\jeca2 8S 
| 2 fon © Bau 
Zia Nw 
1 
1 «© ™ 
avis een 
S2/i-ecces 
1 
azi/s sess 
S2icaocs 
n 
baa) 
Aen © bon S ie 
Sree S fee o IS 
3 
° fea co 
. 2 ie © gs 
SZ2isaocs 
Re) 
iy 
va/S 824 
Fatima oe & 
a, SS © 
© 
==£/S88ss 
Tealscoaoc Ss 
oo i 
2H/e@a eq 
raja acer 
a ~ 
moe) S fon 5 iS 
ZTEela us & 
_ 
S % 
Pay) S fe) Ss is 
VZlaim oN 
a = 
2 ea = Beg 
heli ncao & 
0 S 
ooe;eo 
mom) © foro iS 
weloaoas 
a 
ooo 8 
Mean © fou o Ey 
Velyaoa 
Oo N 
S 
2a — fon — fs 
= 
3 
& 
= 
€ 
8 Z 
ci -— 
co iS i 
o ajs , &§ 
re} 2>i- Ss 2 
aie 8 x 
© 015 5 2 
= om - foe = 


PLOS ONE | www.plosone.org 


BRH-Brahmins 


0.00 0.00 0.00 20.00 0.00 0.00 0.00 0.00 0.00 40.00 5.00 0.747 (0.041 


0.00 0.00 0.00 0.00 25.00 0.00 0.00 0.00 


0.00 0.00 0.00 0.00 9.52 0.00 9.52 0.00 


50 
00 


Te 


40 


Sourashtra 


19.05 33.33 4.76 0,848 (0.054 


0.00 0.00 9.09 0.00 0.00 0.00 0.00 0.00 0.00 36.36 0.00 0.818 (0.083 


0.00 0.00 000 4.76 0.00 0.00 4.76 0.00 


0. 


21 


Brahacharanam 


0.00 0.00 27.27 0.00 9.09 0.00 0.00 0.00 


0.00 
0.00 


3.17 


11 


lyengar 


159 14.29 1.59 3.17 0.00 0.00 635 47.62 0.00 0.746 (0.052 


0.00 0.00 


1.59 4.76 0.00 7.94 0.00 3.17 0.00 


63 


Vadama 


1.48 0.74 0.00 5.93 42.22 2.22 0.779 (0.030) 6.67 


1.48 13.33 0.74 


0.00 0.00 


0.06 


13.33 0.00 2.96 0.00 


0.00 0.74 444 0.00 
16.25 3.10 4.70 


135 3.70 
44 


BRH Total 
1680 


0.886 (0.003 


12.74 8.21 


13.99 1.13 083 036 1.55 2.02 


1.19 0.77 


149 0.12 9.35 


17.38 0.06 


0.3 


31 populations TOTAL 


SD (Standard Deviation). 


doi:10.1371/journal.pone.0050269.t002 


Genetic Structure of Southern Indian Populations 


The geographical origins of many of these HGs are still debated. 
However, the associated high frequencies and haplotype variances 
of HGs H-M69, F*-M89, Rlal-M17, L1-M27, R2-M124 and C5- 
M356 within India, have been interpreted as evidence of an 
autochthonous origins of these lineages during late Pleistocene 
(10-30 Kya), while the lower frequency within the subcontinent of 
J2-M172, E-M96, G-M201 and L3-M357 are viewed as reflecting 
probable gene flow introduced from West Eurasian Holocene 
migrations in the last 10 Kya [6,7,16,23]. Assuming these 
geographical origins of the HGs to be the most likely ones, the 
putatively autochthonous lineages accounted for 81.4+0.95% of 
the total genetic composition of ‘TN populations in the present 
study. These results are concordant with earlier studies based on 
autosomal markers and haploid loci in suggesting lower gene flow 
from West and Central Asia to south India compared to north 
India [5,11,23,65]. Additionally, our results indicate a potentially 
differential genetic impact of these migrations on tribal versus non- 
tribal groups. For example, the proportion of non-autochthonous 
Indian lineages was found to be significant higher (6<0.0001) 
among non-tribal populations (13.7£1.03%) than among the 
tribal populations (7.4+1.09%). In contrast, the proportion of 
likely autochthonous lineages among the tribal populations 
(87.7£1.37%) was significant higher (Fisher test: <0.0001) than 
in non-tribal populations (78.11.24%). 


Genetic structure of Tamil Nadu populations is best 
correlated with subsistence practices 

AMOVA using both HGs and STR distances (R57) was applied 
to several different models of population differentiation to assess 
the proportion of genetic variation explained by geography, tribe- 
caste dichotomy, caste-rank hierarchy, and other socio-cultural 
factors reflecting subsistence practices (Table 3, Table $2). The 
highest genetic variation among classifications involving all 
populations (F¢7= 0.065; among resampled data, median = 0.064, 
95%CI :0.052—-0.078) and the lowest variation within groups 
(Psc= 0.040; median = 0.062; 0.05—0.074) were observed when 
populations were classified into the seven MPGs based on 
subsistence. Further analyses considering only the four non-tribal 
groups revealed a four-fold decrease in genetic variation among 
groups (Fer= 0.015; median = 0.014; 0.003—0.026) when com- 
pared to the three tribal groups alone (fgr= 0.095; medi- 
an= 0.095; 0.066—-0.129). Moreover, the exclusion of HTF 
reduced the between-group variance by more than two-fold 
(6.5% to 2.7%), while exclusion of HTK and BRH had little 
impact. On the other hand, the exclusion of BRH from non-tribal 
groups reduced the between-group variation threefold (1.5% to 
0.4%). 

To determine if the number of groups taken into consideration 
had a significant impact on the #7 values obtained, we compared 
the mean and 95% CI of the null distribution of Va (among group 
variance, data not presented) that is used to estimate the For 
index. It is logical that the Va null distribution would vary with 
different groupings if the relative impact of groups is high. 
Contrary to this, we found that the mean and the standard 
deviations of the null distribution did not vary much among 
groupings (Table 3) hence suggesting that the number of groups 
taken in to consideration did not have much impact on the Foy 
estimates. Further, the 95% CI intervals of the AMOVA estimates 
computed by re-sampling 500 haplotypes with replacement across 
populations showed that 95% CI of 7-MPG classification was 
significantly higher from that of grouping by geography or Varna 
rank status (Table 82). 

The PCA and MDS analyses of HG frequencies and Rg 
distances reflected the AMOVA results (Figures 2a, 2b). In the 


November 2012 | Volume 7 | Issue 11 | e50269 


Table 3. Analysis of molecular variance (AMOVA). 


Genetic Structure of Southern Indian Populations 


Populations Grouping No of groups 


Among groups (Fct) 


Among populations 


within groups (Fsc) Within populations (Fst) 


SNPs STRs SNP? STR? SNP? STR? 
All 31 populations 1 0.103 0.093 
Geography 9 0.025° 0.035" 0.083 0.063 0.106 0.096 
Socio-Cultural Factors 
7 Major Populations Groups (MPG) 7 0.082? 0.065° 0.036 0.040 0.114 0.102 
HTF excluded 6 0.035° 0.026° 0.027 0.034 0.061 0.060 
BRH excluded 6 0.077? 0.059° 0.037 0.042 0.111 0.099 
HTK excluded 6 0.082° 0.062° 0.031 0.039 0.111 0.099 
Caste vs Tribe 2 0.075° 0.062" 0.069 0.065 0.139 0.124 
TR-UP-MID-LOW 4 0.057° 0.047° 0.065 0.063 0.119 0.107 
Tribes Only 
HTF-HTK-HTC 3 0.1106 0.095° 0.081 0.079 0.182 0.167 
Non-tribes (Castes) Only 
UP-MID-LOW 3 0.019" 0.015" 0.024 0.030 0.042 0.044 
SC-DLF-AW-BRH 4 0.023° 0.015" 0.017 0.026 0.039 0.041 
SC -DLF-AW 3 0.0096 0.0044 0.016 0.027 0.025 0.031 
* P<0.00001. 
 P<0.001. 
© P<0.01. 


4 No Significant, P<0.2. 


TR (Tribes), HTF (Hill Tribe Foragers), BRH (Brahmins), HTK (Hill Tribe Kannada speakers), SC (Schedule Castes), DLF (Dry Land Farmers), AW (Artisan & Warriors). 

HG, MID, LOW - High, Middle and Low caste-rank hierarchy as described in Table 1. 

Endogamous populations were grouped based on geography, tribe-caste dichotomy, caste-rank hierarchy, and socio-cultural features mainly reflecting subsistence (7 
Major Population Groups, MPG). The maximal genetic variation among groups (Fc) and the minimal variation among populations within groups (Fsc) was observed 


when populations were grouped based on the 7 MPG classification. 
doi:10.1371/journal.pone.0050269.t003 


PCA analysis the first two components accounted for 38.86% 
variance, while in the MDS analysis a stress value of 15.6% was 
obtained when the objects were clustered in two dimensions. ‘This 
stress value is significant in the light of the work of Sturrock and 
Rocha, 2000 [66]. In both plots, two tribal (HTF, HTK) and the 
non-tribal Brahmin (BRH) groups formed distinct and distant 
clusters, while the rest were interspersed in their midst. 

Interestingly, the same tribal groups showed greater genetic 
similarities to other Dravidian tribes from the southern states of 
Andhra Pradesh and Orissa, and TN BRH clustered with IE 
speaking populations from multiple regions, when the present data 
set was compared with 97 populations from India and neighboring 
regions by PCA (Figure S1, Table $3). The historical migrations of 
BRH into TN and the long-term isolation for some Dravidian 
tribal groups already reported in previous studies [15,17,25] could 
potentially explain why HTF, HTK and BRH groups exhibited 
greater genetic similarities with those culturally related populations 
outside of TIN. Taken together; the PCA, MDS and AMOVA 
results all indicate strong genetic structure among 'I'N populations. 
They further suggest that the MPG classification based on socio- 
cultural factors reflecting subsistence better reproduces true 
endogamous groups by increasing between-population differences 
and reducing within-population variation. 


Non-homogenous HG distributions among constituent 
populations of MPGs 


Fisher exact tests indicated that various HGs were significantly 
predominant in one or another MPG (Table $4). The highest 
frequency of F-M89 (53.3%) was observed among HTF 


PLOS ONE | www.plosone.org 


(p<0.0001), while H1-M52 showed the highest frequency 
(42.5%) in HTK (p<0.0001). Among the non-tribal groups, 
BRH showed 42.2% of Rlal-M17 (p<0.0001), and L1-M27 
appeared at a higher frequency (24.1%; p<0.0001) among DLF. 
However, wide variation in HG frequency and composition was 
observed among the populations included in each of these MPGs 
(Table 2). For example, the proportion of F*-M89 in HTF ranged 
from 75% to 28.6% among the constituent populations. A similar 
pattern was observed in other MPGs characterized by H1-M52 in 
HTK and LI-M27 in DLF. Thus, not all the constituent 
endogamous populations in a MPG shared a similar genetic 
makeup, indicating the differential influence of evolutionary forces 
such as drift, fragmentation, long-term isolation or admixture. 

In addition, Fisher exact tests were used to determine the 
probability of observing multiple populations within an MPG 
sharing the same over- or under-represented HGs by chance (e.g., 
random demic assimilation into a MPG from already differenti- 
ated endogamous populations) or because of the systemic 
inheritance of ancestral lineages among the constituting popula- 
tions of MPGs. Our results rejected the hypothesis that random 
processes could have caused the significant over-representation of 
F*-M89 in HTF+HTK populations (p<0.0001), L1-M27 in DLF 
populations (p<0.001), H1-M52 in HTK populations (p<0.0001), 
and Rlal-M17 in BRH_ populations (#=0.001). Likewise, 
significant results were obtained for under-representation of F*- 
M89 in all BRH populations (#=0.043), LI-M27 in HTF 
populations (#=0.02) and Rlal-M17 in HTF populations 
(p= 0.003). Together, these results argue for the distinctiveness 
of the ancestral gene pools for MPGs and the shared heritage of 


November 2012 | Volume 7 | Issue 11 | e50269 


PC2 (14.59%) 


-50 -25 t) 25 
B PC1 (22.27%) 
ous THD 
o 
‘eis a 
per sn 
0.05 @) 
TL MRV 

nN e °® vom 
= VLR SRT 
G MKv@® @ 
£ 0.00 VNN VNR BHC 
3 6 @ O 
£ PRN DV EZv 
5 OQ PRY 
oO 

-0.05 

NOT 
PRV 
MKB oO 
KNK 
0.10 oO 
0.15 
PLK 
@ 
0.00 0.10 


Component 1 


Figure 2. Plots representing the genetic relationships among 
the 31 tribal and non-tribal populations of Tamil Nadu. (A) PCA 
plot based on HG frequencies. The two dimensions display 36% of the 
total variance. The contribution of the first four HGs is superimposed as 
grey component loading vectors: the HTF populations clustered in the 
direction of the F-M89 vector, HTK in the H1-M52 vector, BRH in the 
R1a1-M17 vector, while the HG L1-M27 is less significant in 
discriminating populations. (B) MDS plot based on 17 microsatellite 
loci R,, distances. The two tribal groups (HTF and HTK) are clustered at 
the left side of the plot while BRH form a distant cluster at the opposite 
side. The colors and symbols are the same as shown in Figure 1, while 
population abbreviations are as shown in Table 1. 
doi:10.1371/journal.pone.0050269.g002 


these paternal lineages among populations within MPGs, in spite 
of their non-homogenous distribution. Further, the over-repre- 
sented HGs marking MPGs explains in part some of the 
organization observed in the PCA and MDS results, and also 
yields insight into the differentiations noted in the AMOVA 
results. 


PLOS ONE | www.plosone.org 


10 


Genetic Structure of Southern Indian Populations 


IHG F*- M89 


Figure 3. Reduced median network of 17 microsatellite 
haplotypes within haplogroup F-M8g9. The network depicts clear 
isolated evolution among HTF populations with a few shared 
haplotypes between Kurumba (HTK) and Irula (HTF) populations. Circles 
are colored based on the 7 Major Population Groups as shown in 
Figure 1, and the area is proportional to the frequency of the sampled 
haplotypes. Branch lengths between circles are proportional to the 
number of mutations separating haplotypes. 
doi:10.1371/journal.pone.0050269.g003 


Reduced median network analysis identifies strong 


founder effects among tribal populations 

RM networks were constructed to evaluate HG diversification 
within TN populations. Here, low-reticulated networks with 
branches showing segregation by population were expected if 
strong founder effects had shaped variation in paternal lineages, 
particularly in the HGs overrepresented in MPGs. By contrast, 
reticulated networks exhibiting shared STR haplotypes between 
populations from different MPGs would indicate that contempo- 
rary populations were derived from descendants drawn from 
differing sources carrying disparate and diverse STR haplotypes, 
suggesting potential admixture among populations. Long branches 
with multiple unoccupied steps (internodes) connecting constituent 
haplotypes would suggest strong genetic drift or possibly sporadic 
intrusion from a genetically distinct source. 

F*-M89 was the only HG showing clear population-specific 
clusters (Paniya, Paliyan and Irula of HTF) suggesting long-term 
isolation (Figure 3). In contrast, all other RM networks did not 
show any population-specific clusters and were reticulated with 
long branches having multiple internodes (Figure S2a to S2e). 
Overall, these results suggest that both genetic drift (possibly due to 
founder effects) and admixture may be a common feature of the 
studied populations. The combination of low segregation among 
RM networks and higher diversity may result from a period of 
assimilation of diverse sources into a larger common gene pool 
from which the modern populations were subsequently drawn. 


HG age estimates are older in non-tribal groups 

Tribes are generally considered as the descendants of the early 
settlers of India and, therefore, better depict the autochthonous 
genetic composition of India than non-tribal populations 
[2,12,15,67]. Association between high frequency and high STR 
variance of a HG in a population are potential indicators of long- 
term in-situ diversification. These may also indicate the likely 
source of the HG in other populations. We therefore investigated 
whether tribal populations possess older genetic lineages, and 


November 2012 | Volume 7 | Issue 11 | e50269 


Genetic Structure of Southern Indian Populations 


7001'697Z0S00'eu0d jeunol/1Z€ LO L:10p 


‘sdnosBojdey awosowosy> A AY} JO JSOW JO} sa}eWIISa aBe Jsapjo ay} pateydsip (sajse>) sdnowb jequj-uoN ‘suonejndje> Woy papnjoxa aiam (sajduues) sy1s g UeYI ssa] YUM sdnodb ‘sueak ul UaAIB ae sayewijsa abe dnosbojdey 


‘(uoeIAag p4epuels) Gs ‘(40419 psepuejs) 4s ‘(adueveA) JeA, 


(LLZ'9) 9b2'EZ (Z8L'v) 7Z9'1LZ (LOb‘r) vOL'LZ 26'v) 088'LL (885'8) 096'ZE (€Z0'v) 8E9'EZ as) a6 
LZL'0) 790 (SLL‘0) £65°0 LZ1'0) 7850 (ZEL'0) 8ZE0 (LEZ'0) 8¥0'L LLL‘0) 7590 Js) 1eA vZLW-cu 
(192'Z) Z8€'EL (LZZ'L) 622'EL (789'Z) OLS‘OL (828'r) pZL'8L (002’€) 900'rL (€59'Z) 8rL'ZL (691'2) vZ6'vL as) e6y 
Z90'0) 69€°0 Z¥0'0) S9E0 (7Z0'0) 9St0 SEL‘0) 00S'°0 (880'0) Z8€"0 (€Z0°0) SEE0 090'0) EL7'0 Js) 1eA ZLIW-LELY 
(Z8€'Z) 169'SE (LSE'VL) peb'sh (85S'r) 660'LZ (L769) €9v'9Z (€€9'9) €0z’'SE as) 26 
7070) S86°0 (96€°0) ¥SZ"L 9710) 785°0 (L610) O€Z'0 €81'0) Z7Z6°0 Js) 1eA LOZW-4 
8S'Z) £09'9 7S2'Z) p6E'9 (27'S) OLO'ZL LZ0'Z) 786'Z as) 26 
1Z0°0) 7810 790'0) 9Z1°0 (€SL'0) 8vE"0 950'0) 0270 JS) Je ZSEW-E1 
(L8Z'v) LO9‘OL (LES’E) 060'SL (S8S'€) 9€7'SL COLL‘) LL8‘OL (Zv2'r) 68L'LL (SSZ'€) 068°Z (€8b'v) ZL8'ZL 09¢'€) Z00'SL as) e6y 
ZEL'O) 85170 260'0) 9170 660'0) 0270 (€L'0) 79v'0 (ZLL'0) 60€°0 (vOL'0) 8170 (7710) vSE0 $60°0) vLv'0 Js) 1eA ZTW-L1 
6LL'y) ZLE'S (690'S) 629°6 ZE6'€) LOV‘OL as) e6y 
LL‘) 6770 (Ov L'0) 992°0 601'0) 687°0 Js) 1eA S9IN-Ee7F 
(pvZ'9) S09'LZ (9¥6'b) 9L1'9€ (LZ€'v) 868'bZ (8pZ'v) 626'SZ 90Z'€) S0z’SL 7S9'€) 865'97 as) e6y 
(ZZ1'0) 7920 9E1'0) 8660 6LL'0) £89°0 LEL'0) ZLZ'0 ZOL'0) O@7'0 LOL‘0) vEL'0 JS) Je ZLLW-2r 
(SLv'Z) ZEE'VZ (ZZ1'8) SOv‘ZL (ZOL‘t) 796'SL Z8€'9) 7Z8'LL G78'€) PZS'LZ as) a6 
(902Z'0) 7Z9°0 97Z'0) 08170 ELL‘0) Lvv'0 9L1'0) 8ZE'0 901'0) 76S°0 JS) Je idy-ZH 
(ZEv'v) E8'LZ (LZ6'€) L6E'8L (060) LLv8L LOE'Z) 7826 (€Zp'L) EvE'L (S06'Z) 7S9‘OL SLv'€) 06€'ZL 7182) L96'vL as) e6y 
(2Z1'0) €65°0 801L'0) 80S°0 €LL'0) 805'0 (€90'0) £20 (6€0'0) €0z°0 (080'0) 7670 960'0) ZvE"0 820'0) EL7'0 Js) 1eA ZSIN-LH 
(6Zv'EL) LL8'7r (629'€L) @78'ZE (888'8L) 6pZ'7S (p2z'8) Ly9‘0z ZL1'8) 8v0'%Z (80612) €20'8r as) e6y 
ZLE‘0) Z8L'L 9LE‘0) 9060 LZS'0) 9S¢'L (Z@Z'0) 0S5°0 9@Z'0) 809'0 16S°0) ZZEL Js) 4eA 691N-H 
(099'6) 969'6Z (88b‘LL) Z56'ZE (S6b'v) EL‘6L (LES‘LL) 600'vE (209'9) LE0'0E as) 26 
(297'0) 0z8'0 ZLE‘0) 8r0'L 7ZL'0) 9€S°0 8LE0) 6€6°0 Z8L'0) 678'0 JS) Je LOZIW-D 
(LZZ'S) 9Z0'8Z (LZ0'Z) L78‘0€ (OLv‘Z) v0S'sz (SLS‘Z) LLO‘6L (9v6'r) 8Lv'vz (09S'r) s68'vz (LEL'S) SvE'6Z as) a6 
8S1'0) E20 761'0) LS8'0 702'0) v0Z°0 (2020) SZS‘0 (9€L'0) 7Z9°0 (9710) £89°0 Zv1'0) OL8'0 JS) 1eA 68-4 
(2SZ'Z) 082’rL (80€'Z) SZL'ZL (LZ5‘9) Lso'zE (8LS‘Z) €@Z'pz (286'Z) 95167 (Z8€'9) 620'6Z as) e6y 
(920'0) v6€'0 Z07'0) vZv'0 L8L°0) S88’0 202'0) 789°0 (0ZZ'0) s08°0 9Z1'0) LO8‘0 Js) 1eA O€LIN-D 
Hug MV ita Ds 31H LH 4LH Dd Ilv dnoabojdey 


"1DO] ayjayesosd|WW Z| UO paseq sa}euljsa abe pue sadueuen dnoiboldey *y ajqey 


November 2012 | Volume 7 | Issue 11 | e50269 


11 


PLOS ONE | www.plosone.org 


— 4960 
HIF 


2160 


4960 


HTK 


3040 


820 


3040 


AW 


490 = 


940 5690 


BRH 


6180 


HTC 


Figure 4. Modal tree obtained by BATWING indicating the 
coalescence time divergence estimates (in years) among Major 
Populations Groups (MPG) after using 17 STRs from all 
haplogroups. BATWING estimates suggest that all populations groups 
started to diverge 7.1 Kya (95% Cl: 5.5-9.2 Kya), with limited admixture 
among them for the last 3.0 Kya (2.3-4.3 Kya), the youngest diverge 
time estimate. The modal tree shows two differentiated nodes with 
clear overlapping estimates of the splits: a first node including one of 
the tribal groups (HTC) together with all the non-tribal MPGs (castes) 
with a divergence time of 6.2 Kya (4.7-8.0 Kya), while the second node 
embraces the HTF and HTK tribal groups with an estimated divergence 
between then of 4.9 Kya (3.6-7.1 Kya). 

doi:10.1371/journal.pone.0050269.g004 


could thus be the potential sources of these lineages for other 
populations, by computing HG age estimates based on Y-STR 
variances (Table 4). The age estimates for all HGs exceeded 10— 
15 Kya with overlapping confidence intervals among MPGs. 
Further, MPG exhibiting high frequencies of specific HGs did not 
show the oldest age estimates. Interestingly, non-tribal groups 
exhibited older age estimates than tribal groups for all HGs, 
excepting R2-M124. These results indicated that tribal and non- 
tribal populations share a genetic heritage dating back to at least 
the late Pleistocene (10-30 Kya). The HG age estimates presented 
here are similar to those generated for the same HGs in earlier 
studies involving a similar or lesser number of samples taken from 
a broader geographic region of India [7,23]. 


BATWING estimates of genetic affinity and ancestry 

We configured several BATWING runs using different subsets 
of data to estimate the dates of population differentiation and 
explore the different demographic processes and affinities among 
the MPGs and their constituent populations. The first set of 
BATWING runs analyzed haplotypes from all HGs among all of 
the MPGs to investigate whether tribal and non-tribal MPGs have 
an independent origin or instead descended from a common 
ancestral gene pool. If tribal and non-tribal groups have 
independent origins, then it would be expected that population 
tree bifurcations marking the differentiation of these two groupings 
would exhibit very old divergence time estimates and non- 
overlapping confidence intervals (CIs). Figure 4 represents the 
modal tree obtained for this BATWING run. It shows that 
populations begin to diverge around 7.1 Kya (95% CI: 5.5- 
9.2 Kya), and contams two differentiated nodes with clear 
overlapping estimates of the splits. The first node separated the 
HTF and HTK tribal groups from the rest of the MPGs, with an 
estimated divergence time of 4.9 Kya (3.6-7.1 Kya), while the 
second included the other tribal group (HTC) and the non-tribal 
MPGs, with a divergence time of 6.2 Kya (4.7-8.0 Kya). These 
BATWING estimates suggest that all MPGs started to diverge 
during the same span of time with very limited admixture among 


PLOS ONE | www.plosone.org 


12 


Genetic Structure of Southern Indian Populations 


them, at least for the last 3 Kya (2.3-4.3 Kya), the youngest time 
estimate. 

The second set of BATWING runs included only haplotypes 
from one of the most common HGs among MPGs. In this regard, 
we would like to emphasize that BATWING results using 
haplotypes from only one HG cannot be interpreted as population 
divergence times, but rather reflect the demographic histories of 
the specific paternal lineage among populations. Also, deviations 
from population estimates among the different runs could reflect 
in-migrations (gene flow) involving a particular HG rather than 
multiple paternal lineages obtained from assimilation from a 
common ancestral gene pool. For these reasons, we explored 
whether the paternal lineages for each HG originated from the 
MPG that exhibits the highest frequency of this HG as a way to 
identify sources and recipients of these Y-chromosomes. In 
addition, similar splitting patterns obtained for the different HG 
trees could be interpreted as demonstrating that the paternal 
lineages entered into the general gene pool from the same 
demographic event. BATWING constructed clear modal trees for 
three HGs (F*-M89, L1-M27 and H1-M52) but not for the others 
(Rlal-M17, H-M69, J2-M172 and R2-M124). The three modal 
trees (Figure S3a—S3c) exhibited very diverse branching patterns 
with tribal and non-tribal MPGs being mixed randomly and 
without the outgroups corresponding to the MPG with the highest 
HG frequency, as would be expected if this MPG were the main 
source of this paternal lineage for other populations. Estimates of 
the time to most recent ancestor (TMRCA) for the HGs ranged 
from 11.4 Kya for F*-M89 to 6.1 Kya for L1-M27. Similar dates 
marking the founding of the clusters identified in the HG F*-M89 
network with Ultranet clustering were obtained by BATWING 
using virtual UEPs to define clusters. The similar TMRCA 
estimates and the diverse tree topologies suggest that extant tribal 
and non-tribal groups derive from the ancient populations of the 
region, with population differentiation taking place at relatively 
similar times under complex demographic histories with multiple 
entries and sources of the common paternal lineages. 

Finally, a third set of BATWING runs were performed using all 
HGs from individual populations within selected MPGs to test 
whether the grouping of these populations could have affected 
BATWING estimates of population divergence and phylogenetic 
relationships (Figure S4a—S4c). All endogamous populations 
grouped according to their MPG classification in the BATWING 
trees with the exception of the HTF-Irula clustering with other 
HTK tribes. This result was not unexpected because the Irula and 
the Kurumba were seen to share STR haplotypes in the F*-M89 
and H*-M69 networks. BATWING estimated the differentiation 
between them to have occurred 3.4 Kya. In addition, BATWING 
assigned similar time frames to those in the previous two sets of 
runs, when major differentiation may have occurred among the 
endogamous populations, independently of the selected popula- 
tions used. Moreover, the two most recent split estimates obtained 
by BATWING runs using endogamous DLF populations agrees 
with historical records, which indicate recent demographic 
expansions for the Vanniyars (2.3 Kya) and Nadars (1 Kya). 
These results further supported the classification of the seven 
MPGs, for which the population divergence time estimates were 
consistent for all sets of BATWING runs. 


Discussion 


The study populations from Tamil Nadu were characterized by 
an overwhelming proportion of Y-chromosomal lineages that 
likely originated within India, suggesting a low genetic influence 
from western Eurasian migrations in the last 10 Kya. Although 


November 2012 | Volume 7 | Issue 11 | e50269 


non-tribal groups exhibited a slightly higher proportion of non- 
autochthonous lineages than tribal populations, the common 
paternal lineages shared by T'N populations are likely drawn from 
the same ancestral genetic pool that emerged in the late 
Pleistocene and early Holocene. We also noted that the current 
modes of subsistence have shaped the genetic structure of TN 
groups, with non-tribal populations being more genetically 
homogeneous than tribal populations likely due to differential 
levels of genetic isolation among them. Coalescence methods, 
employed to identify specific and distinctive periods when genetic 
differentiation among populations occurred, indicated a time scale 
of ~6,000 years. We discuss below whether the timing of the male 
genetic differentiation of the populations fits better with arche- 
ological and historical records for the implementation of the 
Hindu Varna system or with agriculture expansions in the ‘TN 
region. 


Endogamous social stratification preexisted the Varna 
system 

Previous studies of Indian populations have grouped and 
analyzed the genetic data in the light of the Hindu Varna system 
[14,15,16] even though its origin and antiquity are still an ongoing 
topic of debate. One of the theories that has acquired wide support 
relates the establishment of the caste system to Indo-Aryan 
expansions from Western Eurasia into India around 3 Kya. An 
alternative view would see an earlier Indo-Aryan expansion with 
an introduction of cereal farming into Pakistan/North India 
around 8~7 kya. Genetic evidence reported by other studies that 
support these theories are mainly based on the high frequency of 
HG Rlal-M17 in Brahmin castes and their closer genetic affinity 
with West Eurasian populations compared to other Indian non- 
Brahmin castes and tribes [10,20]. However, admixture analyses 
supporting a West Eurasian origin of the Brahmin may be biased 
due to the high frequencies of Rlal-M17 shared between these 
populations, as the rest of their Y-chromosomal variation shows 
little similarity [6,7,16]. Moreover, the recent discovery of new 
markers within Rlal-M17 has allowed Eastern European Y- 
lineages to be differentiated from those in Central/South Asia, 
locating the oldest expansion times with this lineage in Indus 
Valley populations, suggesting an earlier, possibly autochthonous 
origin of this HG in South Asia [68]. The Brahmin populations in 
the present study are also characterized by a significantly higher 
frequency of Rlal-M17 relative to other T'N groups, but without 
any significant frequencies for HGs having a likely origin outside 
India. The TN Brahmin populations also present a very similar 
package of the most common HGs observed in 600 Brahmin 
individuals from all over India [16]. We noted that the highest 
STR variances for HG Rlal-M17 observed in SC and DLF, 
along with the lack of population-specific clusters in the Rlal- 
M17 network and the failure of BATWING to generate a 
definitive modal tree for this HG, all argue against the 
introduction of these paternal haplotypes through a single wave 
of Brahmin (i.e. Indo-Aryan) migration into the region. 

Literary works from the Sangam period (300 BCE to 300 CE) 
describes a heterogeneous society that predates incorporation of 
already established populations into the Hindu Varna system [22] 
in TN. Ancient Tamil society was highly structured by habitat and 
occupation, where endogamy was practiced among populations 
known as kudi [37]. Many of the populations, such as the Valayar 
(meaning net weavers), Pulayar, Paliyan and Kadar, are cited in 
the Sangam literature using the same names that are employed 
today. Thus, a structured society practicnmg endogamy pre-existed 
in TN prior to the inferred arrival of the Indo-Aryans to this 
region. It is therefore most likely that the Varna system was 


PLOS ONE | www.plosone.org 


13 


Genetic Structure of Southern Indian Populations 


superimposed on the pre-existing and historically attested social 
system without significant population transfer or input, imple- 
menting a new social hierarchy and order during the Pallava/ 
Chola period from the 6" through 12" centuries CE [15,22]. 
However, the implementation of the Varna system may have not 
been uniform across preexisting non-tribal populations since many 
of the populations within DLF and tribes do not practice either 
Vedic rituals or have very definite patrilineal system and clan 
exogamy. Overall, our results suggest that the genetic impact of 
Brahmin migrations into T'N has been minimal and had no major 
effect on the establishment of the genetic structure currently 
detected in the region 


Models of agricultural expansions in the study region 
correlate with patterns of genetic diversity 

The present study shows that the MPG classification reflects the 
genetic structure of the T'N populations slightly better than other 
models, and that both tribal and non-tribal populations possess 
predominantly autochthonous lineages derived from a common 
gene pool established during the Late Pleistocene and Early 
Holocene. The distribution of over- and under-represented HGs 
suggests that populations within MPGs tend to share common 
genetic backgrounds. Using BATWING analysis, we estimate that 
social stratification for both tribal and non-tribal MPGs began 
between 6 Kya and 4 Kya, and detectable admixture between 
them has not occurred over the past 3 Kya, thereby allowing them 
to retain their genetic identity through cultural endogamy. 

Both the overall Y-chromosomal HG distribution and the 
divergence estimates for tribal and non-tribal groups, are 
consistent with the archaeological dates and the demographic 
processes involved in the expansion of agriculture in South Asia. 
The South Deccan region near southern Karnataka and southwest 
Andhra Pradesh contains the earliest evidence for an integrated 
agro-pastoral system in South India, and likely acted as 
agricultural center and source of dispersion around 5 Kya 
[30,31,34,69]. The genetic impact of the demographic processes 
involved during the development and spread of agriculture in 
India have been invoked under the Frontier theory framework 
[30]. According to this model, agricultural groups rapidly 
expanding into new environments suitable for farming created 
moving frontiers where autochthonous lineages from multiple pre- 
existing hunting and gathering forager populations were assimi- 
lated into the new agriculturalist populations, thereby producing 
centers of greater genetic diversity with less evidence of isolated 
evolution than observed in foraging populations. ‘This mechanism 
was proposed by Semino ¢¢ al, for convergence of multiple E-M123 
founders into ‘Turkey prior to re-expansion into Europe in order to 
explain the high diversity for that haplogroup [70]. The genetic 
patterns observed in this study, such as the presence of the oldest 
age estimates of autochthonous HGs found among the agricul- 
turalist non-tribal populations (DLF), could reflect assimilated 
paternal lineages from genetically diverse pre-existing populations 
into common gene pools, as well as to suggest that today’s tribal 
groups are not the sole source of these lineages. 

In addition to this moving frontier, broader and more static 
agricultural frontier zones could also have arisen at later stages. In 
this area, stable and growing farming populations interacted with 
local foragers and created new cultural traditions, with some 
potential inter-marriage and assimilation through trade taking 
place. Southern Tamil Nadu and the Kerala zone represent one 
such agricultural frontier zone that has persisted to the present 
after local foragers began to adopt cultivation based on 
agricultural sedentism around 3 Kya [30]. Nowadays, TN tribes 
exhibit a wide variety of occupations and subsistence strategies, 


November 2012 | Volume 7 | Issue 11 | e50269 


and mostly inhabit the Western Ghats Mountains, which harbor 
tropical and semi-tropical rain forests. In this context, two of the 
three tribal groups associated with foraging lifestyles (HTF and 
HTK) show the clearest signals of genetic drift, most likely due to 
strong founder effects and long-term isolation. They exhibit the 
lowest HG diversities (HTF: 0.687; HTK: 0.748), the highest 
proportion of putative autochthonous lineages (HTF: 95.3%; 
HTK: 88.5%), and the lowest ancestral effective population sizes 
estimated by BATWING (results not shown). In addition, the 
persistence of stronger genetic structure among HTF and HTK 
tribal populations, as seen n AMOVA, PCA and MDS analyses, 
suggests limited admixture with other TN populations. The 
absence of any human habitation sites in the Western Ghats until 
the Neolithic, and the late paleo-botanical evidence for cultivation, 
suggest a relatively late occupation of these mountains [34]. It is 
therefore possible that, upon agricultural expansion into previously 
non-cultivated areas, the present day tribal populations were 
displaced to more isolated regions, where they retained their mode 
of subsistence and genetic distinctiveness until the present day. 
The overall Y-chromosomal landscape of TN suggests a 
complex process of agricultural expansion, which can be explained 
in terms of the formation of moving and static frontiers since 
6 Kya, followed by migrations structured by habitat and 
occupation. However, because gene flow and differential assim- 
ilation of incoming migrations could alter the estimated divergence 
dates, they should be treated with caution. Our BATWING 
simulations and others from a previous study [62] have shown that 
topologies and population splits for modal trees are susceptible to 
admixture between already differentiated populations, which 
considerably reduces the times of split, but insensitive to migration 
into a region bringing new paternal lineages. ‘This means that the 
divergence time estimates presented here likely reflect the latest 
major admixture that occurred among the populations being 
sampled from the TN region. In this regard, it is important to note 
that our BATWING estimates are concordant with historical 
records of major splits between two Vanniyar and between two 
Nadar populations, thereby supporting the ability of BATWING 
to detect recent demographic events. ‘Thus, the main limitation of 
BATWING is related to its lack of power to detect earlier 
demographic events and its bias in clearly detecting recent gene 
flow among the populations studied. In any case, our conclusions 
supporting a common autochthonous Indian genetic heritage from 
the late Pleistocene/early Holocene for both tribal and non-tribal 
populations and refuting the hypothesis of the establishment of a 
structured and endogamous system due to an Indo-Aryan 
migration or implementation of the Varna System, still hold even 
if the BATWING divergence times are underestimates. 
Although previous genetic studies have already drawn some of 
the conclusions presented here [6,7,16,23], this is the first trme 
(which we are aware of) that a genetic study showed clear 
evidences of the existence of long-standing endogamous popula- 
tion identities within a highly structured Indian society established 
prior to the regional implementation of the Varna system. Further, 
these paternal genetic identities likely resulted as a byproduct of 
demographic processes that occurred during the creation of 
moving and static frontiers of agricultural expansions into ‘TN 
[30,69]. The meticulous sampling strategy focused on a local area, 
and comparison of genetic data with the paleoclimatic, arche- 
ological, and historical background information available for the 
region, allowed us to address these questions at a deeper level than 
previous studies have. Moreover, this approach reduced consid- 
erably the confounding relationships among socio-cultural factors 
allowing us to further explore and test in detail the relationships 
between ethnography and genetics. Indeed, the pattern of long- 


PLOS ONE | www.plosone.org 


14 


Genetic Structure of Southern Indian Populations 


term separation among populations within and between MPGs, 
and the genetic affinities of the constituent populations within 
MPGs, are significant features that would be lost if populations 
were pooled by other proxies based on broad classifications such as 
tribal versus non-tribal categories or Varna rank-caste hierarchy. 
We were also able to show that not all of the tribal populations 
reflect the oldest genetic legacy of the region and that each tribal 
population has a unique and distinct evolutionary history. 

Thus, the sampling and analytical approach employed here 
suggest that detailed local genetic studies within India could give 
us new insights about the relative influences of past demographic 
events in relation to other socio-cultural and economic factors that 
might have influenced the population structure of the whole of 
India that is observed today. Nevertheless, it cannot be assumed 
that the same demographic processes or socio-cultural factors 
affected Indian populations from different regions in a similar 
manner. Whether corresponding Y chromosome genetic patterns 
can be also detected in other tribal and non-tribal populations 
within the South Deccan or in other Indian regions that have 
already been identified as centers of agricultural expansions, are 
open questions that future studies could potentially address using 
the methods presented here. Finally, it would also be important to 
investigate the relative impact of the processes explained here on 
the diversity patterns in other genomic regions by studying 
mtDNA and autosomal variation. 


Supporting Information 


Figure Sl PCA plot showing the first two principal 
components of haplogroup frequencies for 97 non-tribal 
(circles) and tribal (squares) populations of India and 
nearby regions from previous publications, compared to 
the non-tribal (horizontal ovals) and tribal (diamonds) 
populations from the present study. Symbols have been 
colored according to linguistic classification. Population codes and 
references are shown in Table 83. 

(TIF) 

Figure $2. Reduced median network of 17 microsatellite 
haplotypes within haplogroup. (a) HG C-M130 using 74 
chromosomes, (b) HG H1-M52 using 292 chromosomes (c) HG 
H- M69 using 79 chromosomes, (d) HG L1 — M27/M76 using 235 
chromosomes, (ec) HG Rlal-M17 using 214 chromosomes. Circles 
are colored based on the 7 Major Population Groups as shown in 
Figure 1, and the area is proportional to the frequency of the 
sampled haplotypes. Branch lengths between circles are propor- 
tional to the number of mutations separating haplotypes. 


(TIFF) 


Figure $3 Modal tree obtained by BATWING indicating 
the coalescence time divergence estimates (in years) 
among Major Populations Groups (MPG) using 17 STRs 
from haplogroup (a) F-M89, (b) H1-M52, (c) L1-M26/ 
M72. 

(TIFF) 

Figure S4 Modal tree obtained by BATWING indicating 
the coalescence time divergence estimates (in years) 
among endogamous populations within (a) HTF and 
HTK groups, (b) DLF, (c) BRH and HTC, using 17 STRs 
from all haplogroups. 

(TIFF) 

Table S1 List of Y chromosome SNPS and haplotype 
data for the 1680 individuals from 31 tribal and non- 
tribal populations presented in this study. 


(XLS) 


November 2012 | Volume 7 | Issue 11 | e50269 


Table S2. AMOVA analysis of various population group- 
ings based on the 17STR haplotype & 95%CI based on 
re-sampling of the samples across populations. 


(XLS) 
Table S3_ List of population codes and their publication 
references used in Figure S1. 


(XLS) 
Table S4 Fishers exact test p-values for the NRY HG 


frequencies among the 7 Major Populations Groups and 
among the 31 sampled populations. 


(XLS) 


Acknowledgments 


The authors gratefully acknowledge all participants from Tamil Nadu, 
whose collaboration made this study possible. We thank all the field work 
assistants who helped us with sampling in various expeditions. We thank 
Prof N. Sukumaran and Dr. D.Ramesh for their help in sampling logistics 
at Tirunelveli and north Tamil Nadu, respectively. We thank Chella 
Software, Madurai, for developing and providing the “Input” programs for 
Arlequin and Network softwares. We also thank Prof. Francesc Calafell, 
Late Prof. V.Sudarsen and Dr. Sumathi for helpful discussions, Dr. Peter 
Forster for kindly providing the Network Publisher software and Mrs. 
Mathuram for the secretarial assistance at the Madurai Genographic 
Center. 

Genographic Consortium Members 

Christina J. Adler (University of Adelaide, South Australia, Australia), 
Elena Balanovska (Research Centre for Medical Genetics, Russian 
Academy of Medical Sciences, Moscow, Russia), Oleg Balanovsky 
(Research Centre for Medical Genetics, Russian Academy of Medical 
Sciences, Moscow, Russia), Jaume Bertranpetit (Universitat Pompeu 
Fabra, Barcelona, Spain), Andrew C. Clarke (University of Otago, 
Dunedin, New Zealand), David Comas (Universitat Pompeu Fabra, 
Barcelona, Spain), Alan Cooper (University of Adelaide, South Australia, 


References 


1. Cavalli-Sforza LL, Menozzi P, Piazza A (1994) The History and Geography of 
Human Genes. Princeton, NJ.: Princeton University Press. 

2. Majumder PP (2010) The human genetic history of South Asia. Curr Biol 20: 
R184-187. 

3. Singh KS (2002) People of India: An Introduction, Revised edition: Oxford 
Uniersity Press. 

4. Metspalu M, Kivisild T, Metspalu E, Parik J, Hudjashov G, et al. (2004) Most of 
the extant mtDNA boundaries in south and southwest Asia were likely shaped 
during the initial settlement of Eurasia by anatomically modern humans. BMC 
genetics 5: 26. 

5. Kivisild T, Rootsi S, Metspalu M, Mastana S, Kaldma K, et al. (2003) The 
genetic heritage of the earliest settlers persists both in Indian tribal and caste 
populations. Am J Hum Genet 72: 313-332. 

6. Sahoo S, Singh A, Himabindu G, Banerjee J, Sitalaximi T, et al. (2006) A 
prehistory of Indian Y chromosomes: evaluating demic diffusion scenarios. Proc 
Natl Acad Sci U S A 103: 843-848. 

7. Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds CA, et al. (2006) 
Polarity and temporality of high-resolution y-chromosome distributions in India 
identify both indigenous and exogenous expansions and reveal minor genetic 
influence of Central Asian pastoralists. Am J Hum Genet 78: 202-221. 

8. Chaubey G, Metspalu M, Kivisild T, Villems R (2007) Peopling of South Asia: 
investigating the caste-tribe continuum in India. Bioessays 29: 91-100. 

9. Quintana-Murci L, Semino O, Bandelt HJ, Passarino G, McElreavey K, et al. 
(1999) Genetic evidence of an early exit of Homo sapiens sapiens from Africa 
through eastern Africa. Nat Genet 23: 437-441, 

0. Cordaux R, Aunger R, Bentley G, Nasidze I, Sirajuddin SM, et al. (2004) 
Independent origins of Indian caste and tribal paternal lineages. Curr Biol 14: 
231-235. 

1. Indian Genome Variation Consortium (2008) Genetic landscape of the people of 
India: a canvas for disease gene exploration. Journal of genetics 87: 3-20. 

2. Majumder PP (2008) Genomic inferences on peopling of south Asia. Current 
opinion in genetics & development 18: 280-284. 

3. Basu A, Mukherjee N, Roy S, Sengupta S, Banerjee S, et al. (2003) Ethnic India: 
a genomic view, with special reference to peopling and structure. Genome 
research 13: 2277-2290. 

4. Dirks NB (2001) Castes of Mind: Colonialism and the Making of Modern India. 
Princeton, NJ: Princeton University Press. 


PLOS ONE | www.plosone.org 


15 


Genetic Structure of Southern Indian Populations 


Australia), Clio S. I. Der Sarkissian (University of Adelaide, South 
Australia, Australia), Matthew C. Dulik (University of Pennsylvania, 
Philadelphia, Pennsylvania, United States), Jill B. Gaieski (University of 
Pennsylvania, Philadelphia, Pennsylvania, United States), Wolfgang Haak 
(University of Adelaide, South Australia, Australia), Marc Haber (Lebanese 
American University, Chouran, Beirut, Lebanon), Angela Hobbs (National 
Health Laboratory Service, Johannesburg, South Africa), Asif Javed (IBM, 
Yorktown Heights, New York, United States), Li Jin (Fudan University, 
Shanghai, China), Matthew E. Kaplan (University of Arizona, Tucson, 
Arizona, United States), Shilin Li (Fudan University, Shanghai, China), 
Begofia Martinez-Cruz (Universitat Pompeu Fabra, Barcelona, Spain), 
Elizabeth A. Matisoo-Smith (University of Otago, Dunedin, New Zealand), 
Marta Melé (Universitat Pompeu Fabra, Barcelona, Spain), Nirav C. 
Merchant (University of Arizona, Tucson, Arizona, United States), R. John 
Mitchell (La Trobe University, Melbourne, Victoria, Australia), Amanda 
C. Owings (University of Pennsylvania, Philadelphia, Pennsylvania, United 
States), Lluis Quintana-Murci (Institut Pasteur, Paris, France), Daniela R. 
Lacerda (Universidade Federal de Minas Gerais, Belo Horizonte, Minas 
Gerais, Brazil), Fabricio R. Santos (Universidade Federal de Minas Gerais, 
Belo Horizonte, Minas Gerais, Brazil), Himla Soodyall (National Health 
Laboratory Service, Johannesburg, South Africa), Pandikumar Swamik- 
rishnan (IBM, Somers, New York, United States), Pedro Paulo Vieira 
(Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil), Miguel 
G. Vilar (University of Pennsylvania, Philadelphia, Pennsylvania, United 
States), Pierre A. Zalloua (Lebanese American University, Chouran, 
Beirut, Lebanon). 


Author Contributions 


Conceived and designed the experiments: VJXk AKR RSW RMP. 
Performed the experiments: GA VJK VSA AS KSA JSZ RMP. Analyzed 
the datas GA VJK DFSH LP CTS DEP RMP. Contributed reagents/ 
materials/analysis tools: DESH JSZ LP DEP. Wrote the paper: GA DFSH 
CR TGS CTS DEP RMP. Field work, sample identification and collection 
of samples and demographic data: GA VJK VSA AS KSA KTG KV MN 
MJ RMP. 


15. Sanghvi LD, Balakrishnan V, Karve I (1981) Biology of the People of Tamil 
Nadu.; Prune, editor. Calcutta: Indian Society of Human Genetics and Indian 
Anthropological Society. 

Sharma S, Rai E, Sharma P, Jena M, Singh S, et al. (2009) The Indian origin of 
paternal haplogroup R1a1(*) substantiates the autochthonous origin of Brahmins 
and the caste system. J Hum Genet 54: 47-55. 

Champaklakshmi R (2001) Reappraisal of a Brahmanical Institution: The 
Brahmadeya and its Ramifications in Early Midieval South India. In: Hall KR, 
editor. Structure and Society in Early South India: Essays in Honour of Noboru 
Karashima. Delhi: Oxford University Press. pp. 59-84. 

Krishnan KG (1984) Karandai Tamil Sangam Plates of Rajendrachola I. New 
Delhi: Archaeological Survey of India. 

Wells RS, Yuldasheva N, Ruzibakiev R, Underhill PA, Evseeva I, et al. (2001) 
The Eurasian heartland: a continental perspective on Y-chromosome diversity. 
Proc Natl Acad Sci U S A 98: 10244-10249. 

Bamshad M, Kivisild T, Watkins WS, Dixon ME, Ricker CE, et al. (2001) 
Genetic evidence on the origins of Indian caste populations. Genome research 
11: 994-1004. 

Zhao Z, Khan F, Borkar M, Herrera R, Agrawal S (2009) Presence of three 
different paternal lineages among North Indians: a study of 560 Y chromosomes. 
Annals of human biology 36: 46-59. 

22. Shastri KAN (1976) A history of South India : from prehistoric times to the fall of 
Vijayanagar. Madras: Oxford University Press. xii, 521 p., [512] leaves of 
plates p. 

Trivedi R, Sahoo S, Singh A, Bindu GH, Banerjee J, et al. (2008) Genetic 
Imprints of Pleistocene Origin of Indian Populations: A Comprehensive 
Phylogeographic Sketch of Indian Y-Chromosomes. Internation Journal of 
Human Genetics 8(1—2): 97-118. 

Carvalho-Silva DR, Tyler-Smith C (2008) The Grandest Genetic Experiment 
Ever Performed on Man? - A Y-Chromosomal Perspective on Genetic Variation 
in India. International journal of human genetics 8: 21-29. 

. Thanseem I, Thangaraj K, Chaubey G, Singh VK, Bhaskar LV, et al. (2006) 
Genetic affinities among the lower castes and tribal groups of India: inference 
from Y chromosome and mitochondrial DNA. BMC genetics 7: 42. 
McCrindle JW (2000) Ancient India as described by Megasthenes and Arrian: 
Munshirm Manoharial Pub Pvt Ltd. 

Silverberg J (1969) Social Mobility in the Caste System in India: An 
Interdisciplinary Symposium. The American Journal of Sociology 75: 443-444. 


16. 


17: 


18. 


19. 


20. 


21, 


23. 


24. 


26. 


27. 


November 2012 | Volume 7 | Issue 11 | e50269 


28. 


29. 


30. 


31. 


36. 


37. 


38. 


39. 


40. 


41. 


42. 


43. 


44, 


46. 


47. 


48. 


49. 
50. 


5. Selvakumar V (2002) Hunter-Gatherer Adaptations in Madurai Region, Tamil 


Srinivas MN (1952) Religion and Society among the Coorgs of South India. 
Oxford: Clarendon Press. 

Pappu S, Gunnell Y, Taieb M, Brugal J-P, Touchard Y (2003) Excavations at 
the Paleolithic Site of Attirampakkam, South India: Preliminary Findings. 
Jurrent Antrhopology 44: 591-598. 

Fuller DQ (2006) Agricultural origins and frontiers in South Asia: a working 
synthesis. J World Prehist 20: 1-86. 

Fuller DQ (2007) Non-human genetics, agricultural origins and _ historical 
linguistics in South Asia. In: Petraglia MD, Allchin B, editors. The Evolution 
and History of Human Populations in South Asia. Dordrecht, The Netherlands: 
Springer. pp. 393-443, 


. Haslam M, Korisettar R, Petraglia M, Smith T, Shipton C, et al. (2010) In 


Foote’s Steps: The History, Significance and Recent Archaeological Investiga- 
tion of the Billa Surgam Caves in South India. South Asian Studies 26: 1-19. 


. Misra VN (2001) Prehistoric human colonization of India. J Biosci 26: 491-531. 


Morrison KD (2007) Non-human genetics, agricultural origins and historical 
linguistics in South Asia. In: Petraglia MD, Allchin B, editors. The Evolution 
and History of Human Populations in South Asia. Dordrecht, The Netherlands: 
Springer. pp. 321-339. 


Nadu, India: From c. 10,000 BP to c. A.D. 500. Asians Perspectives 40: 71-102. 
Ramaswamy V (1994) Matallurgy and traditional metal crafts in Tamil Nadu 
(with special reference to bronze). Indian Journal of History of Science 29: 445— 
476. 

Kanakasabhai V (1904) ‘The Tamils Eighteen Hundred Years Ago. Madras and 
Bangalore: Higginbotham and Co. 

Thurston E, Rangachari K (1909) Castes and tribes of southern India. Madras: 
Government Press. 

Keay J (2001) India: a history. New York: Grove Press. 

Pitchappan RM, Kakkanaiah VN, Rajashekar R, Arulraj N, Muthukkaruppan 
VR (1984) HLA antigens in South India: I. Major groups of Tamil Nadu. Tissue 
antigens 24: 190-196. 

Rajasekar R, Kakkanaiah VN, Pitchappan R (1987) HLA antigens in South 
India. II: Selected caste groups of Tamil Nadu. Tissue antigens 30: 113-118. 
Pitchappan RM, Balakrishnan K, Sudarsen V, Brahmajothi V, Mahendran V, 
et al. (1997) Sociobiology and HLA genetic polymorphism in hill tribes, the Irula 
of the Nilgiri hills and the Malayali of the Shevroy hills, south India. Human 
biology 69: 59-74. 

Balakrishnan K, Pitchappan RM, Suzuki K, Kumar US, Santhakumari R, et al. 
(1996) HLA affinities of Iyers, a Brahmin population of ‘Tamil Nadu, South 
India. Human biology 68: 523-537. 

Watkins WS, Thara R, Mowry BJ, Zhang Y, Witherspoon DJ, et al. (2008) 
Genetic variation in South Indian castes: evidence from Y-chromosome, 
mitochondrial, and autosomal polymorphisms. BMC genetics 9: 86. 


. Ramana GV, Su B, Jin L, Singh L, Wang N, et al. (2001) Y-chromosome SNP 


haplotypes suggest evidence of gene flow among caste, tribe, and the migrant 
Siddi populations of Andhra Pradesh, South India. European journal of human 
genetics : EJHG 9: 695-700. 

Kumar V, Reddy AN, Babu JP, Rao TN, Langstich BT, et al. (2007) Y- 
chromosome evidence suggests a common paternal heritage of Austro-Asiatic 
populations. BMC Evol Biol 7: 47. 

Lokur Committee (1965) The Advisory Committee on the Revision of the Lists 
of Scheduled Castes and Scheduled Tribes (Lokur Committee). New Delhi: 
India, Government of. 

Jonstituent Assembly (1949) Constitution of India. In: Ministry of Law and 
Justice, editor. New Delhi, India: Ministry of Law and Justice, India. 

Mandal BP (1979) Mandal Commission,. New Delhi. 

Kalelkar K (1955) First Backward Classes Commission. New Delhi. 


PLOS ONE | www.plosone.org 


16 


I. 


52. 


60. 


61. 


62. 


63. 


64. 


65. 


66. 


67. 


68. 


69. 


70. 


Genetic Structure of Southern Indian Populations 


Balanovsky O, Dibirova K, Dybo A, Mudrak O, Frolova S, et al. (2011) Parallel 
Evolution of Genes and Languages in the Caucasus Region. Molecular biology 
and evolution 28: 2905-2920. 

Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL, et al. 
(2008) New binary polymorphisms reshape and increase resolution of the human 
Y chromosomal haplogroup tree. Genome research 18: 830-838. 

Excoffier L, Laval G, Schneider S (2005) Arlequin (version 3.0): An integrated 
software package for population genetics data analysis. Evol Bioinform Online 1: 
47-90. 

Excoffier L, Smouse PE, Quattro JM (1992) Analysis of molecular variance 
inferred from metric distances among DNA haplotypes: application to human 
mitochondrial DNA restriction data. Genetics 131: 479-491. 


. Jolliffe I (1986) Principal Coponents Analysis, Second Edition. New York, NY: 


Springer. 

Nelson MR, Bryc K, King KS, Indap A, Boyko AR, et al. (2008) The Population 
Reference Sample, POPRES: a resource for population, disease, and 
pharmacological genetics research. Am J Hum Genet 83: 347-358. 


. Cattell R (1966) The scree test for the number of factors. Multiv Behav Res 1: 


245-276. 
Kruskal JB (1964) Multidimensional scaling by optimizing goodness of fit to a 
nonmetric hypothesis. Psychometrika 29: 1—27. 


. Bandelt HJ, Forster P, Rohl A (1999) Median-joining networks for inferring 


intraspecific phylogenies. Molecular biology and evolution 16: 37-48. 

Forster P, R6hl A, Lunnemann P, Brinkmann C, Zerjal T, et al. (2000) A short 
tandem repeat-based phylogeny for the human Y chromosome. Am J Hum 
Genet 67: 182-196. 

Wilson I, Balding D, Weale M (2003) Inferences from DNA Data: Population 
Histories, Evolutionary Processes and Forensic Probabilities. Journal of the 
Royal Statistical Society: Series A (Statistics in Society) 166: 155-188. 

Haber M, Platt DE, Badro DA, Xue Y, El-Sibai M, et al. (2011) Influences of 
history, geography, and religion on genetic structure: the Maronites in Lebanon. 
European journal of human genetics : EJHG 19: 334-340. 

Xue Y, Zerjal T, Bao W, Zhu S, Shu Q, et al. (2006) Male demography in East 
Asia: a north-south contrast in human population expansion times. Genetics 
172: 2431-2439. 

Zhivotovsky LA, Underhill PA, Cinnioglu C, Kayser M, Morar B, et al. (2004) 
The effective mutation rate at Y chromosome short tandem repeats, with 
application to human population-divergence time. Am J Hum Genet 74: 50-61. 
Reich D, Thangaraj K, Patterson N, Price AL, Singh L (2009) Reconstructing 
Indian population history. Nature 461: 489-494. 

Sturrock K, Rocha J (2000) A Multidimensional Scaling Stress Evaluation 
Table. Field Methods 12: 49-60. 

Krithika S, Maji S, Vasulu TS (2009) A microsatellite study to disentangle the 
ambiguity of linguistic, geographic, ethnic and genetic influences on tribes of 
India to get a better clarity of the antiquity and peopling of South Asia. 
American journal of physical anthropology 139: 533-546. 

Underhill PA, Myres NM, Rootsi S, Metspalu M, Zhivotovsky LA, et al. (2010) 
Separating the post-Glacial coancestry of European and Asian Y chromosomes 
within haplogroup Rla. European journal of human genetics : EJHG 18: 479-— 
484. 

Morrison KD (2007) Foragers and forager-traders in South Asian worlds: some 
thoughts from the last 10,000 years. In: Petraglia MD, Allchin B, editors. The 
Evolution and History of Human Populations in South Asia. Dordrecht, The 
Netherlands: Springer. pp. 321-339. 

Semino O, Magri C, Benuzzi G, Lin AA, Al-Zahery N, et al. (2004) Origin, 
diffusion, and differentiation of Y-chromosome haplogroups E and J: inferences 
on the neolithization of Europe and later migratory events in the Mediterranean 
area. American journal of human genetics 74: 1023-1034. 


November 2012 | Volume 7 | Issue 11 | e50269 


aacnce 
aqeunaere 


WRRRARARARRALR! 
ARRURARRARTERAERRE AER 
RRBARERREERBaeanees 

WeRReUeeeeaaanaaeaRas 
WARRARRARARAAAMARRERe 
WARUP PaRRaURaaanarn’ 
ve) WaRaaeRaee’ 
RAAT REE 

RAE RRALLEY 


WERRRAAR 
arueeennens 


OXFORD MONOGRAPHS ON MEDICAL GENETICS 


ENOMICS AND 
HEALTH IN THE 
DEVELOPING WORLD 


EDITED BY 


DHAVENDRA KUMAR 


GENOMIC PERSPECTIVES OF PEOPLING AND LANGUAGES 
OF THE INDIAN SUBCONTINENT 


Ganesh Prasad Arunkumar, Varatharajan Santhakumari Arun, Adhikarla Syama, 
Valampuri John Mary Selvam Kavitha, and Ramasamy Pitchappan 


INTRODUCTION 


“Genomics,” the study of the genome of a species, is 
the buzz word of the twenty-first century, thanks to the 
Human Genome project that ushered in this new era of 
biology (Baltimore, 2001). The genomic tools are simple, 
straightforward and more accurate, thus making biol- 
ogy a more exact science, similar to physics and chem- 
istry. Nonetheless, the purpose of the studies and the 
design of the experiments become more critical in such 
a venture due to the enormous diversity and complexity 
of the biological phenomena that operate in evolutionary 
processes. The Indian subcontinent, the second success- 
ful home of humankind, is special in this evolutionary 
process due to her population’s long history, many migra- 
tions, isolation, divergence, and cultural evolution since 
the first emigration of man (Wells et al., 2001). The impact 
of natural selection that has operated on these disparate 
gene pools in an alien environment is a matter of intense 
scrutiny, since no parallel for such longstanding and sym- 
patrically isolated populations exists in other parts of the 
world, apart from the birthplace of mankind in Africa. 
~ Many of these isolations seem to have occurred prior to 
language developments. The geographical subsistence and 
cultural isolations presumably lead to different language 
developments in various parts of India. We attempt here 
to interpret the Non Recombinant Y (NRY) chromosome 
polymorphisms of India in the context of migrations and 
origin of languages. 

In 1901, Karl Landsteiner, the discoverer of the human 
ABO blood group and Nobel laureate, first provided direct 
evidence for the existence of genomic diversity in human 
populations. In 1919, Hirszfeld and Hirszfeld found 
ABO gene variations among human populations. The 
B blood group was unique and most prevalent in South 
Asia, particularly in southern Indian tribal populations 
(Cavalli-Sforza et al., 1994). During the 1950s and 1960s, 
more systemic analysis of variation in genes and proteins 


became possible with the detection by Pauling et al. 
(1949) of blood protein polymorphisms in hemoglobin. 
The 1980s were a transitional period from the analysis of 
gene polymorphisms to protein polymorphism (Sanghvi 
et al., 1981), to the studies of DNA sequence polymor- 
phisms in the form of Human Genome and other vari- 
ome projects (Baltimore, 2001). To better understand the 
origin of this genomic diversity, one may need to study 
population-level forces such as migration and miscegena- 
tion, which play major roles in creating diversity. It has 
been proposed that genomic differentiation in popula- 
tions is mostly due to “fission” followed by independent 
evolution (Cavalli-Sforza, 1997). Mutations, natural selec- 
tion, and drift play important roles in deciphering diver- 
sity at the population level. While mutations supply raw 
material for genomic diversity by introducing new alleles, 
their survival and expansion is dependent on their fitness 
and functional importance. The study of polymorphism 
at the single nucleotide (SNP) level in introns (noncoding 
region) or exons (coding region), or at the microsatellite 
level, becomes a powerful tool in studying genomic diver- 
sity in both health and disease. Recent literature on NRY 
chromosomes makes them ideal candidates to study pop- 
ulation diversity. NRY is evolutionarily a neutral marker, 
thus permitting us to reconstruct a population’s history. 
The distribution of NRY variations of various linguistic 
states of India becomes more interesting. The analyses 
throw better light on the population migrations, language 
development, and its spread. 


RECENT AFRICAN ORIGINS 


The recent African origin and spread of anatomically 
modern humans suggested that Homo sapiens sapiens, 
our species, evolved from a small African population 
that had subsequently colonized the whole world, sup- 
planting former hominids, ~120-200 thousand years 


ago (kya) around the time of the first appearance of 
anatomically modern humans (Cann et al., 1987). This 
replacement model, now widely accepted, has been 
later called the “Out of Africa,” or “Recent African 
Origin” (RAO) model, in contrast to the earlier Multi- 
Regional Evolution Model (MRE; see Wolpoff et al., 
1984). Molecular evidence favors the RAO model. Older 
populations evolved for longer must have had more time 
to accumulate genomic diversity. The excess African 
diversity can thus be explained by older onset of popula- 
tion demographic expansion in Africa, combined with 
higher effective population size, population size fluctua- 
tions, and also periodic extinctions of populations out- 
side Africa or positive selection through adaptation to 
new environments outside Africa (Eller, 2001; Aquadro 
et al., 2001). The non-African patterns of genetic varia- 
tion are indeed a subset of African ones. Microsatellite 
studies also showed a gradual reduction of diversity with 
increasing distance from Africa, and linkage disequilib- 
rium values, which reflect the lower ages of haplotypes 
in non-African populations (Tishkoff et al., 1996). The 
Indian subcontinent, the second to be occupied by man, 
thus attracts our attention to investigate further in these 
directions. The RAO model proposes one, two, or multi- 
ple migrations using various routes over a period of time. 
Two routes have been proposed: the first is the “north- 
ern route” over Sinai, leading to eastern Asia through 
the steppes of central Asia and southern Siberia, and the 
second is the “southern route” over southern Arabia, fol- 
lowed by migration along the coastline of India. While 
the northern route model could explain the peopling of 
the whole of Eurasia by a single migration from Africa, 
the southern route model is interpreted as implying at 
least two separate late Pleistocene dispersal events, one 
leading to the northwest and the other to the east of 
Eurasia (Cavalli-Sforza et al., 1994). 


INDIAN CORRIDOR 


Being positioned at the tri-junction of African, northern 
Eurasian and oriental realms, India has served as a major 
corridor for the dispersal of modern humans (Cann, 2001) 
and attracted many streams of people since the Paleolithic, 
starting with the Late Pleistocene as supported by archae- 
ological evidence (Paddaya, 1982; Misra, 2001; Petraglia 
et al , 2010). Though the modern anthropology tends to 
reject the somatoscopic and anthropological measure- 
ments, there is a revival of interest in deciphering skin 
color genes and studying their genome with modern tools 
(Yuldasheva et al., 2002). Sanghvi and Karve, distinguish- 
ing various castes of Tamil Nadu, India, have deciphered 
that nose shape and skin color are the most discrimina- 
tive (Sanghvi et al., 1981). Even today, the Indian physi- 
’ cal anthropologists consider these ancient classifications, 


894 


identifying four different morphological groups in India 
(Bhasin, 2006). These are: (i) “negritos,” characterized by 
dwarf stature and frizzy hair, who are common in Nilgiri 
hills of Tamil Nadu (Paniya, Irula, and Kadar tribes) and 
the Andaman islands: we see them nowadays in many caste 
populations, including Brahmins; (ii) “proto-Austroloids,” 
characterized by long head, dark skin, and broad nose, 
found in central and southern India and speaking 
Dravidian languages/dialects; (iii) “Mongoloids,” char- 
acterized by broad face, medium stature, yellow skin, 
and slightly obliquely set eyes, exclusively found in sub- 
Himalayan and northeastern regions, speaking Austro- 
Asiatic (AA) or Tibeto-Burman (TB) languages; and (iv) 
“Dinaric” type (Mediterranean element) with medium to 
light pigment, hook nose, acrocephalic and round heads, 
found in Bengal and Orissa. The “Caucasoids” or the 
“Nordic”, with blond hair and long heads and speaking 
Indo-European (IE) languages is most common in the 
north and northwestern regions of India. The four major 
language families of India seem to have their own non- 
overlapping geographic clines. It will be interesting to 
compare the distribution of the NRY markers and the ori- 
gin of these languages, and answer whether these could 
have arisen through fission and a long process of isolation 
in various regions of India. 


NRY PHYLOGENY IN INDIA 
AFRICAN ROOT 


The roots of Y phylogeny roots in Africa have been dated 
around 100 kya (Underhill, 2003), characterized by HG 
A-M91 and HG B-M60 NRY-SNP haplogroups (HGs), 
and restricted to Africa. These migrations and subse- 
quent mutations formed the scaffold on which all other 
Y- chromosome diversification with geographical cline 
has occurred. The majority of Y lineages across the globe 
are composed of a tripartite assemblage consisting of 
(1) HG C-M130, (2) HG D-M174 and HG E-M96, and 
(3) overarching HG F-M89, which defines the internal 
node of all remaining HGs, G-M201 through R-M207 
(Underhill et al., 2001; Wells, 2007). 


OUT OF AFRICA EMIGRATIONS 


The HG C-M130, not seen in any African populations 
presumably originated somewhere in Asia on an M168 
lineage sometime after an early departure event (Capelli 
et al., 2001; Underhill et al., 2001; Table 74-1). This clade 
(C-M130) characterizes the first migrants into India: the 
descendants we could identify near Madurai (Wells et al., 
2001). This clade has many sublineages displaying irregu- 
lar geographic patterning consistent with diversification 


GENOMICS IN MEDICINE AND HEALTH—INDIAN SUBCONTINENT 


and northward migration of this HG C-M130, since the 
last ice age, with the westernmost limit in India: thus HG 
C3-M217, a transversion mutation, is common in East 
Asia and Siberia, with representatives in North America 
(Karafet et al., 2001; Lell et al., 2002), and eastern and 
central parts of central Asia, while C5-M356 is common 
in India (Sengupta et al., 2006). This lineage is absent in 
Indonesia, Oceania (Kayser et al., 2000), and Yunnan, 
China (Karafet et al., 2001). 

The ancestors who accumulated HG D-M174 and HG 
E-M96 mutations could have arisen in Africa or Asia 
(Underhill, 2003). HG E-M96 lineages are the most fre- 
quent in Africa, and display subsequent binary and mic- 
rosatellite diversification. Conversely, Asian haplogroup 
D-MI174 occurs at low frequencies throughout eastern 
Asia, except in remote and isolated locations like Tibet, 
Japan, and the Andaman islands (Underhill et al., 2001; 
Thangaraj et al., 2003). 

The third major and most successful subclade of M168 
lineages, characterized by super-haplogroup F-M89, 
defines the root from which all others (HGs G-M201 


through R-M207) originated and have evolved outside 
Africa (Kivisild et al., 2003). HG F-M89 diversified into 
many branches with region-specific markers—the Middle 
East showing HGs G-M201 and J-M304, Europe with HG 
I-M170, and India with F-M89 and H-M69 lineages, sel- 
dom observed elsewhere (Table 74-1). 


EXPANSION IN INDIA 


HG F*-M89* is the most paraphyletic subcluster (unclas- 
sified derivative) of M168 lineages, ubiquitous but found 
with lesser frequency in various parts of India. Many 
tribal populations of southern India possess higher fre- 
quencies of F*-M89* with high STR variance, particularly 
Dravidian speaking groups of Tamil Nadu and Koya of 
Orissa (Kavitha, 2008; Wells et al., 2001; Kivisild et al., 
2003; Cordaux et al., 2004a; Table 74-2). The high STR 
variance of HG F*-M89* from Tamil Nadu and Andhra 
Pradesh has suggested a deep time depth of 45,000 YBP 
(Sengupta et al., 2006). 


TABLE 74-1 THE AGES OF THE NRY HAPLOGROUPS AND THEIR DISCRIMINATING ALLELES 


PREVALENT IN INDIA AND NEARBY REGIONS. 


NRYHG Marker Estimated Age 


of the mutation 


Distribution 


Reference 


VBR? 
Cc M130 ] 50,000 | India, Australia, Central Asia America, | Genographic* 
F M89 | 45,000 | India Genographic# 
G | M201 30,000 Genographic# 
H M69 20,000-30,000 | Genographic# 
H1 M52 | 25,000 India Genographic# 
J M304 31,700 Middle East Semino et al. (2004) 
J2 M172 15,000-—20,000 | Hammer et al. (2000) 
K Mg | 40,000 Genographic# 
it M20 30,000 India Genographic# 
[sl | M27/M76 9100 Sengupta et al. (2006) 
(0) M175 35,000 Orissa, North East, South East Asia, Genographic# 
02a M95 11,700 Orissa Sengupta et al. (2006) 
03 M122 10,000 North East, South East Asia, China Genographic# 
IP M45 40,000 North Asia Wells et al. (2001) 
Q M242 15,000-18,000 Seielstad et al. (2003) 
M207 30,000 Genographic# 
R2 M124 25,000 India Genographic# 
Riat M17 15,000 Caucus, Europe, India Wells et al. (2001) 


*The estimated ages have all been determined based on the available data: if it was nota representative sampling, then the age may vary. 
Ascertainment bias is possible due to smaller samples and sampling errors; hence, some of these ages may need to be considered with 


caution. 


*www.nationalgeographic.com/genographic website. 


GENOMIC PERSPECTIVES OF PEOPLING AND LANGUAGES OF THE INDIAN SUBCONTINENT 


895 


TABLE 74-2 NRY HG ALLELE DISTRIBUTION DATA USED FOR COMPUTING PAN-ASIAN PCA 


Population 
Codes in 
Serial Country / Province / Language Pan-Asian Sample 
No Population Region State families PCA, Fig2 size C-M130 D-M174 E*-M96 G-M201 I-M170_ F*-M89 
st Pathan Pakistan IE Pak7 21 4.762 10) 0 9.524 0 4.762 
2 Sindhi Pakistan IE Pak8& 10) 10) 0 4.762 
3 Hazara Pakistan IE Pak4 0 0 4 (0) 
4 Kalash Pakistan IE Pak5S 0 20 10} 0 
— 
5 Makrani Pakistan IE Pak6 5 0 (0) 0 
6 Konka Brahmin India West | Goa IE Goat 10) 0 (0) 4.651 
alk 
ie Gujarat India West | Gujarat IE Guj3 10) 0 0 3.448 
8 Gujarat Brahmin India West | Gujarat IE Gujt 3.125 10.94 10) 10) 
9 Bhils India West | Gujarat IE Guj2 22 9.091 10) 10) 0 (0) 18.18 
10 Desasath Brahmin | India West | Maharashtra IE Mah2 16 6.25 0 0 10) 10) 10) 
44 Kathari India West | Maharashtra IE Mah3 19 10) 10) 0 0 0 26.32 
12 Maratha India West | Maharashtra IE MahS 36 5.556 10) 0 0 (0) 5.556 
43 Punjab India West | Punjab IE Pun2 66 3.03 0 0 0 0 4.515) 
14 Punjab Brahmin India North | Punjab IE Punt 49 4.082 (0) ie) 4.082 (0) 4.082 
a5: Kashmir Gujars India North | Jammu Kashmir | IE JS&KL 49 2.041 10) 0 10) (e) 4.082 
ie 
16 Kashmiri Pandits | India North | Jammu Kashmir | IE J&K2 51. 1.961 0 te) 1.961 0 3.922 
17 Rajput India North | Rajasthan IE Raji 29 3.448 0 0 0 10) 10.34 
18 Uttar Pradesh India North | Uttar Pradesh IE Uprd Si 0 (0) 0 10) 0 te) 
Brahmin 
——— ee 

19 Bihar Brahmins India Bihar IE Biht 56 1.786 (0) 10) 10) 0 0 

Central 
20 Madhya Pradesh India Madhya IE MP1 42 10) 10} 10) 0 0 2.381 

Brahmins Central Pradesh 

21 Halba India Maharashtra IE Mah4 2a 0 0 0 0 10) 23.81 

Central 
22 Karan India Orissa IE Orit 18 0 =) 10) 10) 0 10) 10} 

Central 
23 Oriya Brahmin: India Orissa IE Ori2 24 (0) 0 0 10) 10) 4.167 

Central 


896 


GENOMICS IN MEDICINE AND HEALTH—INDIAN SUBCONTINENT 


TABLE 74-2 (CONTINUED) 


P-M74/  Q*-M242/ 
H1*-M52 J2-M172 L-M20/M11 K-M9 N-M231 0O*-M175 O2a-M95 03-M122 M45 P36 R*-M207 Rial-M17 R2-M124 Reference 
9.524 0 9.524 (0) 0 (0) 0 0 fe) 9.524 14.29 38.1 Sengupta 
et al., 2006 
(0) 28.57 4.762 0 (0) (0) 0 0 0 4.762 0 Sengupta 
et al., 2006 
0 4 10) 0 0 0 0 8 10) 8 32 10) Sengupta 
et al., 2006 
20 10 25 10) 10) (0) (0) 10) 10) (0) 5 Sengupta 
et al., 2006 
0 25 20 0 0 (0) 0 0 0 5 10 Sengupta 
et al., 2006 
6.977 13.95 18.6 2.326 |0 (0) 0 {0} 0 (0) ie} 9.302 Kivisild et al., 
2003 
10.34 20.69 10.34 3.448 |0 0 0 (0) 6.897 |0 0 3.448 Kivisild et al., 
2003 
1.563 15.63 7.813 S125) |/3:425 io} O {0} 0 0 9.375 9.375 Sharma et al., 
| 2009 
9.091 18.18 18.18 ie} (e} ie} (0) 0 (0) (0) 0 18.18 Sharma et al., 
2009 
18.75 12.5 12)5 (0) 0 0 (0) (e} ie} {0} 0 Sahoo et al., 
2006 
36.84 5.263 5.263 ie} 0 0 0 (e} 5.263 |0 5.263 15.79 ie} Sahoo et al., 
2006 
30.56 19.44 ss Is 0 0 (0) 0 (0) 0 0 0 13.89 13.89 Sengupta 
et al., 2006, 
Sahoo et al., 
2006 
3.03 21.21 12.12 fe) 0 (e) 0 (0) 7.576 (0) 0 46.97 4.545 Kivisild et al., 
2003 
0 22.45 6.122 0 0 0 (0) {0} 0 (0) ie} 34.69 24.49 Sharma et al., 
2009 
10.2 6.122 16.33 8.163 | 0 0 (0) 0 0 2.041 2.041 40.82 8.163 Sharma et al., 
2009 
9.804 9.804 5.882 9.804 |0 0 (0) 0 (0) 5.882 17.65 19.61 13.73 Sharma et al., 
2009 
17.24 13.79 6.897 0 0 0 (0) 3.448 (e) (0) te) 31.03 13.79 Sengupta 
et al., 2006 
16.13 3.226 3.226 (e} ie} 0 fe) (0) 0 6.452 0 67.74 3.226 Sharma et al., 
2009 
0 8.929 8.929 3.571 |0 0 (e) (0) 0 3.572 3.574 64.29 5:357 Sahoo et al., 
2006, Sharma 
et al 2009 
7.143 23.81 7.143 (e} 2.381 fe) 0 (0) 2.381 | 4.762 (0) 38.1 11.9 Sharma et al., 
2009 
23.81 0 10} ie} 0 fe) 28.57 0 (0) 4.762 0 19.05 fe) Sengupta 
et al., 2006 
16.67 5.556 0 (e) 0 fe) (0) 0 (0) (0) 0 55.56 22.22 Sahoo et al., 
2006 
—————— 
8.333 4.167 20.83 fe) 0 0 (0) oO 4.167 fe) 4.167 41.67 42-5) Sahoo et al., 
2006 
(Continued) 
GENOMIC PERSPECTIVES OF PEOPLING AND LANGUAGES OF THE INDIAN SUBCONTINENT 897 


TABLE 74-2 (CONTINUED) 


Population 
Codes in 
Serial Country / Province / Language Pan-Asian Sample 
No Population Region State families PCA, Fig2 size C-M130 D-M174 E*-M96 G-M201 I-M170 F*-M89 
24 Lambadi India South | Andhra Pradesh | IE AndQ 53 44532 Oo 0 10) 10} 3.774 
25 W.Bengal India East West Bengal IE WB1 Suk 3.226 (0) 10) 3.226 10) 6.452 
26 Karmali India East West Bengal IE WB4 16 10) 0 10} 0 10} 0 
27 Kora India East West Bengal IE WB5 17 1¢) 10) 10) 10) 10) A765: 
28 WB. Brahmin India East | West Bengal IE WB7 49 (e) (0) 0 0) fe) 0 
29 Garo India East Meghalaya TB NE2 33 0 10) 10) 0 10) 18.18 
30 Jamatia India East West Bengal TB NE3 30 10} 0 0 ie} 0 10) 
SHE Korku India West | Maharashtra AA Mahi 59 0 0 10) 0 fe) 15.25 
ir 
32 Asur India Jharkand AA Jha 55 0 0 0 0 0 25.45 
Central 
| entra | 
33 Birjia India Jharkand AA Jha2 24 0 0 0 ie) 0 0 
Central 
: Fi 
34 Korwa India Jharkand AA Jha3 42 10) 0 10) 0 10) Ss Weis} 
Central 
35 Savar India Jharkand AA Jha4 47 10) 10) 10) 0 0 40.43 
Central | 
36 Kharia India Jharkand AA Jha5 46 2.174 0 10} 10) 10) 39.13 
Central 
37 Munda India Jharkand AA Jha6 60 10} 10} 10) 0 0 23.33 
Central 
{ r | ai] 
38 Juang India Orissa AA Ori3 59 10) 10) (0) 10) 10) 1.695 
Central 
39 Ho India Orissa AA Orid 116 0 ie) | 10) 10) 10) 22.41 
Central 
40 Mahali | India West Bengal AA WB3 38 0 10} 10) (0) 10) 39.47 
Central 
| 
441 Khasi India East Meghalaya AA NEL 92 (0) 10) 10) 0 10) 17.39 
IE 
42 Mudi India East | West Bengal AA WB2 Sif 0 10) (0) 10) 0 45.95 
in 
43 Lodha India East West Bengal AA WB6 71 1.408 10) 0 0 0 14.08 


898 GENOMICS IN MEDICINE AND HEALTH—INDIAN SUBCONTINENT 


TABLE 74-2 (CONTINUED) 


P-M74/  Q*-M242/ 
H1*-M52 J2-M172 L-M20/M11 K-M9 N-M231 O*-M175 O2a-M95 03-M122 M45 P36 R*-M207 Riai-M17 R2-M1i24 Reference 


5.66 3.774 1432 3.774 |0 (0) (0) 0 33.96 |0 132 13.21 1.887 Sahoo et al., 
2006, Kivisild 
et al., 2003 


9.677 6.452 0 (0) 0 3.226 (0) 0 6.452 0 0 38.71 22.58 Kivisild et al., 
2003 


0 0 0 0 0 10) 10) 10) 0 100 Sahoo et al., 
2006 


(0) 0 10} 29.41 0 (0) 0 0 5.882 10} Sahoo et al., 
2006 


(0) 10} 0 0 0 (0) 0 (0) 71.43 22.45 Sharma 

et al., 2009, 
Sengupta 

et al., 2006 


(0) 0 3.03 18.18 54.55 (0) (0) 6.061 Kumar et al., 


2007 


te) 10) 10) 6.667 76.67 10) 10} 10) Sengupta 


et al., 2006 


(0) te) ie} 81.36 1.695 1.695 |0 0 


Kumar et al., 
2007 


10) 0 0 63.64 0 10) 0 1.818 Kumar et al., 


2007 


10) 10} 0 95.83 0 4.167 0 0 Kumar et al., 


2007 


ie} fe) 0 59,52 0 10) 4.762 Kumar et al., 


2007 


31.91 0 10) Kumar et al., 
2007 


0 12.77 0 (0) 0 0 14.89 


6.522 10} 10) Kumar et al., 
2007, Sahoo 
et al., 2006 


2.174 2.174 (0) 10} te) 2.174 45.65 


11.67 10} 6.667 Kumar et al., 
2007, Sahoo 
et al., 2006 


10} 0 (0) 1.667 |0 0 50 


0 fe) 0 {0} te) 0 98.31 ie} Sahoo et al., 
2006, Kumar 


et al., 2007 


0 0.862 0 (0) (e) 0 74.,.55' (0) 2.586 |0 2.586 Sengupta 

et al., 2006, 
Kumar et al., 
2007, Sahoo 


et al., 2006 


13.16 5.263 (0) 5.263 | 0 0 7.895 0 0 0 13.16 Kumar et al., 
2007, Sahoo 


et al., 2006 


te) Kumar et al., 


2007 


(0) 0 0) 0 (0) 2.174 41.3 29.35 4.348 |0 


(0) 2.703 (0) 10} ie} 0 43.24 (0) 2.703 (0) 2.703 Kumar et al., 


2007 


5.634 30.99 te) 2.817 |0 (e) 5.634 (e) 0 (0) 0 1.408 38.03 Sengupta 

et al., 2006, 
Kumar et al., 
2007 


(Continued) 


GENOMIC PERSPECTIVES OF PEOPLING AND LANGUAGES OF THE INDIAN SUBCONTINENT 899 


TABLE 74-2 (CONTINUED) 


Population 
Codes in 
Serial Country/ Province / Language Pan-Asian Sample 
No Population Region State families PCA, Fig2 size C-M130 D-M174 E*-M96 G-M201 I-M170 F*-M89 
44 Oraon India Jharkand DR Jha7 100 10} 0 0 10) 0 54 
Central 
1 
45 Muria India Orissa DR Ori4 20 10} (0) 10) 0 0 10 
Central 
+— 
46 Koraga India South | Andhra Pradesh | DR Andi 33 10) 6.061 0 10) (0) (0) 
47 Koya (oe South | Andhra Pradesh | DR And2 41 0 10} (0) 0 10} 36.59 
7 4 
48 Yerava India South | Andhra Pradesh | DR And3 x 26.83 0 0 te) 0 43.9 
49 Kappu naidu India South | Andhra Pradesh | DR And4 18 0 1¢) 0 5.556 0 (0) 
50 Komati India South | Andhra Pradesh | DR And5 20 (0) (0) (0) 5 (0) 10 
7 ] ' 
Bas Naikpod Gond India South | Andhra Pradesh | DR And6 Ps 22.22 0 0 0 10) Hele lh 
52 Raju India South | Andhra Pradesh | DR And7 19 10} 0 0 | 10) (0) 10) 
IE 
53 Yerkual India South | Andhra Pradesh | DR And8& 18 0 fe) 0 (¢) (0) 0 
7 
54 Konda Reddy India South | Andhra Pradesh | DR And10 30 ie} (e} fe) 0 (0) 23.33 
55 Koya Dora India South | Andhra Pradesh | DR And1it 27, (0) 0 0 1@) 0 25.93 
56 Andh | India South | Andhra Pradesh | DR | And1i2 54 ers 0 10) 0 10) 3.704 
57 lyer India South | Tamil Nadu DR TN1 29 6.897 ie) (0) 10.34 0 3.448 
I. 
58 Kurumba India South | Tamil Nadu DR TN2 19 10) 10) 10) 10) 10) 15.79 
59 lyengar India South | Tamil Nadu DR TN3 47 0 0 10} 8.511 0 0 
60 lrula India South | Tamil Nadu DR TN4 40 5 10} 0 10) 0 42.5 
61 Pallan India South | Tamil Nadu DR TN5 44 2.273 10} 0 0 0 6.818 
62 Kallar India South | Tamil Nadu DR TN6 93 6.452 (0) 0 0 10) 16.13 
63 Sinhalese SriLanka SriLanka DR SL1 39 10} 10) fe) 0 0 12.82 
; 
64 Burushaki Pakistan unclassified (a 20 5 0 10) 5 10) (0) 


Abbreviations of language families: IE=Indo European, AF=Afro-Asiatic, DR=Dravidian, TB=Tibeto-Burman, ST=Sino-Tibetan, AA=Austro-Asiatic, AT=Altaic, BR=Brusaki. 


900 GENOMICS IN MEDICINE AND HEALTH—INDIAN SUBCONTINENT 


P-M74/ Q*-M242/ 


H1*-M52 J2-M172 L-M20/M11 K-M9 N-M231 O*-M175 O2a-M95 03-M122 M45 P36 R*-M207 Rial-M17 R2-M124 Reference 
3 te) 0 10) te) 0 35 0 4 10} 4 0 3 Kumar et al., 
2007, Sahoo 
et al., 2006 
~ = | i ie 1 4 
80 0 te) (0) 0 io} 10 ie} 0 (0) {0} 0 Sengupta 
et al., 2006 
+ | 12 Ie ie i = | 4 | 
87.88 0 0 te) 10} ie} 0 0 te) 0 0 0 6.061 Cordaux et al., 
2004a 
T -— fH — | — — r 
60.98 0 10} 0 te) 0 (0) 0 0 0 (0) 2.439 io} Kivisild et al., 
200 
aa eal J He I | J M| ‘ 
19.51 0 0 0 0 0 fo) 0 0 0 (0) 9.756 0 Cordaux 
et al.,2004a 
Fi 1 [ ie C t 
fe) 0 0 0 (0) 0 (0) 0 0 (e) 44 4 Ae ala 72,22 Sahoo et al., 
2006 
il M +— j— if 
(0) 10) 10) (0) 10} 0 (0) 0 0 10} (0) 415 70 Sahoo et al., 
| | 2006 
r =I = al 
61e4e1 0 5.556 (0) (0) 0 0 ie} 0 (0) 0 {0} 0 Sahoo et al., 
2006 
[ L 
fe) 10.53 21.05 15.79 |0 10} (0) 0 0 (0) 15.79 26.32 10.53 Sahoo et al., 
2006 
- 
fo) 0 Ades: 55.56) |/0 0 0 0 0 fe) (0) 33.33 0 Sahoo et al., 
2006 
+ 
S333) fe) ie} 0 (0) 0 66.67 0 0 (e) (0) 6.667 0 Sengupta 
et al., 2006 
L 
22.22 3.704 0 0 (0) ie} 48.15 0 (6) fe) ie} te) 0 Sengupta 
et al., 2006 
16.67 35.19 1.852 (0) 0 fe) 1.852 1.852 (0) 0 0 31.48 5.556 Thanseem 
et al., 2006 
+— + 4+. T 
3.448 17.24 17.24 fe) 0 (0) 0 0 (e) 0 3.448 27.59 10.34 Sengupta 
t al., 2006 
le L | Ee 
68.42 fe) 5.263 fe) (0) (0) fe) 0 ie} 0 (0) (0) 10.53 Sengupta 
et al., 2006 
al | HE + 
23.4 19.15 19.15 0 (e) 0 fe) 0 oO 0 0 23.4 6.383 Sengupta 
et al., 2006, 
Sahoo et al., 
2006 
7 j ic iz Te T 
SD 2.5 7.5 (0) (0) (e) 0 fe) (e) (0) (0) 0 1D Sengupta 
et al., 2006, 
Sahoo et al., 
2006 
ie 4) I _ L + aI 
29.55 9.091 11.36 9.091 |0 0 (0) 0 0 (0) 2.273 15/91 13.64 Sengupta 
et al., 2006, 
Sahoo et al., 
2006 
\e Ie =| =| = { 
18.28 1.075 44.09 (0) (0) fo) 1.075 0 1.075 0 0 3.226 8.602 Wells et al., 
2001, Sahoo 
et al., 2006 
L 
- + + = = 
7.692 10.26 17.95 fo) 0 (0) 0 0 0 (0) (0) 12.82 38.46 Kivisild et al., 
2003 
IL = + i 
15 5 15 5 ie) 0 10) 5 0 10) 30 ie) 15, Sengupta 
il et al., 2006 
GENOMIC PERSPECTIVES OF PEOPLING AND LANGUAGES OF THE INDIAN SUBCONTINENT 901 


NRY HG HI1-M532, a derivative of M69, has been 
reported in higher frequencies in southern India, cut- 
ting across the caste and tribal boundaries (Wells 
et al., 2001; Kivisild et al., 2003). A few other studies 
have found HG H1-M52 with high STR variance in 
Maharashtra (Sengupta et al., 2006) and western India 
(Trivedi et al., 2008). Thanseem et al. (2006) have sug- 
gested that M52 originated in the Indian subcontinent 
immediately after the Late Pleistocene settlements. The 
available samples in literature show an estimated age of 
25,000 years (Table 74-1). 


NEOLITHIC CATTLE KEEPERS 


The J-M172 clade implicated in agricultural expansion 
through Neolithic cattle keepers is thought to have arisen 
in the Caucusus and Anatolia and spread to southwest- 
ern Europe (Cavalli-Sforza et al., 1994; Semino et al., 
2004; Hammer et al., 1998; Rosser et al., 2000); Bedouin 
and Palestinian Arabs possess the highest frequency of 
this mutation (66%—55%) followed by Sephardic Jews 
and Muslim Kurds (40%; see Semino et al., 2004). The 
J HG is divided into two sub-haplogroups, J1-M267 and 
J2-M172, with the former showing an ancestral Y-STR 
haplotype 14-16-23-11-12, for loci DYS19-DYS388- 
DYS390-DYS392-DYS393 and the latter 14-15-23-11-12 
(Giacomo et al., 2004). A one-step mutated haplotype 
of this J2, viz. 15-15-23-11-12, is the common clade in 
India: the Thodas of Nilgiris possess this J2 in higher fre- 
quencies, correlating with their pastoral buffalo cult life 
(Kavitha, 2008). It has been proposed that earlier migra- 
tions brought agriculture and Dravidian speakers into 
India, while another, much later one brought rice cultiva- 
tors from Southeast Asia (Diamond and Bellwood, 2003; 
Fuller, 2003). HG J2-M172 has shown a high STR diversity 
in Dravidian tribal populations, but to hypothesize that 
this HGJ2-M172 is a part of a Neolithic expansion would 
require more evidence (Thanseem et al., 2006). The avail- 
able datasets thus do not correlate well with these two 
major events. The archaeology once again does not sup- 
port this contention. 

Some studies have found HG J2-M172 at higher fre- 
quencies in Dravidian and Indo-European castes than 
in tribes (Sengupta et al., 2006, Cordaux et al., 2004a). 
It is absent in East Asia, and typically present in Central 
Asia at frequencies of 10%-20%, leading Cordaux et al. 
to interpret that Indian HG J2-M172 originated from 
Central Asia rather than West Asia. The data listed in 
Table 74-2 shows the presence of J2 in a wide variety 
of populations across India. Neolithic markers of early 
farmers—HGs E3-M35 and G-M201, that are prevalent 
in Europe, Anatolia, the southern Caucusus, and Iran— 
are, however, sporadic in Indians (Semino et al., 2000; 
Underhill et al., 2001). 


902 


CENTRAL ASIAN EXPANSION 


An expansion of HG F-M839 lineages toward Central Asia 
or the Caucusus also gave rise to a founder that acquired 
the HG K-M9 mutation, defining another major bifur- 
cation in the phylogeny. Distinctive HG K-M9 sublin- 
eages have been observed in India, the Middle East, and 
Europe, while some HG K-M9 and HG M-M186 lineages 
are restricted to Oceania. Three major lineages of K-M9, 
HG P-M45 are characteristic of North Asia, while HG 
Q-M242 is found in Siberia and North America, and the 
westward-expanding HG R-M207 in Eurasia. K-M9 has 
given rise to two offshoots, one HG L-M20 prevalent in 
the Indian subcontinent to become L1-M76 in southern 
India, and another HG O-M175 found in eastern Asia, the 
whole of oriental populations including the Chinese, and 
also in the Austro-Asiatic speakers and Tibeto-Burmese 
speakers of India. The genomic evidence further supports 
this (HUGO Pan-Asian SNP Consortium, 2009). 


ORIGIN OF L AND DRAVIDIAN SPEAKERS 


The NRY HG L-M20 is virtually absent in Europe, but 
found irregularly and at low frequencies in populations 
of the Middle East and southern Caucusus (Nebel et al., 
2001). It occurs at a frequency of 4.3% in Pakistan and 
13.5% in Central Asia (Qamar et al., 2002; Semino et al., 
2000; Wells et al., 2001). Ata resolution of six STR loci, four 
Chenchu tribal individuals from Andhra Pradesh shared 
a widespread common haplotype 14-12-22-10-14-11; 
DYS19- DYS388- DYS390-DYS391- DYS392- DYS393. 
This is shared by Lambadis, Punjabis, and Iranians. An 
Armenian haplotype 15-12-23-10-13-11, commonly found 
in their HG L-M20, is a three-step mutation (Weale et al., 
2001). These differences indicate two distinct founders 
and independent expansions: more data is required to 
identify the antiquity of these populations. The hitherto 
available L subtyping data shows the presence of HG 
L1-M76 in many northwestern states and Dravidian- 
speaking southern belts of India (Trivedi et al., 2008). 
Sengupta et al. (2006) found a subtype of HG L-M20 to 
be the most common haplogroup in India, and proposed 
its early diversification in Dravidian speakers and subse- 
quent expansion toward peripheral regions, suggesting 
an Indian origin of Dravidian speakers. The Brahmin 
populations from Tamil Nadu have been considered as 
Dravidian speakers, to prove their argument. However, 
Sahoo et al. (2006) observed absence of HG L-M20 in IE 
speakers from Bihar, Orissa, and West Bengal, and has 
concluded that distribution of NRY HGs in India was 
associated with geography rather than linguistics. Among 
Austro-Asiatic (AA) speakers of India, as mentioned ear- 
lier, HG O2-M95 is predominantly a Southeast Asian 
marker (Basu et al., 2003) and virtually absent in central 


GENOMICS IN MEDICINE AND HEALTH—INDIAN SUBCONTINENT 


Asia (Wells et al., 2001). Thanseem et al. (2006) have found 
HG O2-M95 at the highest frequency in AA tribes (52%), 
and a deeper coalescence age (68,000 YBP) that does not 
fit with the history of other NRY clades. Non-AA castes 
and tribes have a frequency of this marker of 6.3%, and 
the scenario has suggested a footprint of earlier AA set- 
tlers carrying this defining mutation. HG O3-M122, and 
its sublineage HG O3e-M134 that spread through East 
Asia (Su et al., 2000) showed the highest frequency among 
Tibeto-Burman (TB) speakers of North East India , while 
the caste groups of the region possess only 3% (Trivedi 
et al., 2006). Further, since the coalescence age for HGs 
C-M130, H-M69, and R2-M124 was deeper compared to 
HG O-M175, they concluded that AA speakers could not 
have been the earliest settlers of India. More recent data 
in Table 74-1, however, suggest an estimated age of 35,000 
years for HGO-M175 and 11,700 years for HG O2a-M95. 


R2 RESTRICTED TO INDIA AND ITS NEIGHBORS 


HG R2-M124, a last major clade of significance to appear 
in India, is restricted to India, Pakistan, Iran, and southern 
Central Asia (Kivisild et al., 2003); however, this has been 
seen with highest frequency (53%) among Sinte Romani 
(Gypsy) (Wells et al., 2001). Cordaux et al. (2004a) have 
suggested that this HG R2 originated in India; this con- 
clusion was based on the presence of this clade in both 
Dravidian and Indo-European speakers. Within India 
it is predominant in the east coast and southern India 
(Sahoo et al., 2006). Network analysis of available data has 
depicted that a large number of haplotypes were shared 
between populations of South India, while the popula- 
tions of eastern India harbored more discrete haplotypes, 
originating in situ. 


THE ENIGMA OF R1A1 


Contrary to R2, the widespread northern Indian clade 
among Brahmin-related groups, HG Rlal-M17, has been 
linked with the recent spread of Kurgan culture origi- 
nating in southern Russia/ Ukraine and dispersing to 
Europe, Central Asia, and India between 3000-1000 BCE 
(Passarino et al., 2001; Quintana-Murci et al., 2001; Wells 
et al., 2001). In a global analysis, a deeper Palaeolithic 
time depth of ~15,000 YBP for HG Rlal-M17 mutation 
has been suggested (Semino et al., 2000; Wells et al., 2001). 
Further, two region-specific Y-STR allele patterns have 
been associated with HG Rlal-M17 among Europeans 
(Passarino et al., 2002): allele 15 at DYS19 and alleles 19 
and 21 at locus YCA Ila,b against the background of HG 
Rlal-M17 characterize populations of Western Europe, 
while alleles 16 for DYS19 and 19,23 for YCA Ila,b charac- 
terize Eastern European populations. 


GENOMIC PERSPECTIVES OF PEOPLING AND LANGUAGES OF THE INDIAN SUBCONTINENT 


Interestingly, the high frequency of HG Rlal-M17 is 
concentrated around the elevated terrain of central and 
western Asia, and is present at a relatively low frequency 
in Caucusus and Middle East. In Central Asia, its fre- 
quency is highest in the highlands among Tajiks, Kyrgyz, 
and Altais (>50%) and drops down to <10% in the plains 
among the Turkmenians and Kazakhs (Wells et al., 2001; 
Zerjal et al., 2002). In contrast to the above, other stud- 
ies have observed a high HG frequency in Central Asians 
and lower average STR diversity than in Indian castes and 
tribes. This has been attributed to a founder effect from 
southern and western Asia during the early Holocene 
expansion, contributing HG Rlal-M17 chromosomes to 
both Central Asian and South Asian tribes prior to the 
arrival of the Indo-European speakers (Kivisild et al., 
2003; Thanseem et al., 2006; Trivedi et al., 2008). Zerjal 
et al. (2002), however, attributed the low Y- STR diversity 
to a bottleneck effect in Central Asian populations. Some 
authors also propose an Indian origin for the HG Rlal- 
M17 based on the high frequency and associated STR 
variance in India (Sharma et al., 2009), while others attri- 
bute the origin to C.Asia (Wells et al., 2001). While exten- 
sive subclades and subtypes have been identified for NRY 
HG Rib, the Rlal has been the least studied (Underhill 
et al., 2010) for lack of new markers; we await more data 
from our genographic project (www.nationalgeographic. 
com/genographic), in order to further decipher the early 
population-movements scenario. 


LANGUAGE CORRELATES OF NRY 
DISTRIBUTION—A PRINCIPAL COMPONENT 
ANALYSIS 


The overall NRY diaspora based on the hitherto avail- 
able data suggests a pattern of peopling of India. While 
HG Rlal is prevalent in the northern Indian belt, the 
HG O and its derivatives are predominantly seen in east- 
central and northeastern regions of India, mostly among 
tribals. HG L is restricted to various Dravidian speaking 
populations of India and some populations of Pakistan 
(Figure 74-2). The data in Table 74-2 and Figure 74-1, 
showing the stacked areas of various alleles in different 
populations, reveals a striking correlation between NRY 
composition and the languages they speak. This is further 
brought out by the principal component analysis (PCA) 
(Figure 74-2) of the data in Table 74-2. The first two com- 
ponents of the PCA account for more than 50% of the 
total variance (Figure 74-3). 

The geographical distribution of the NRY HGs 
described in the previous paragraph is clearly brought out 
in the PCA plot of various NRY HG frequency distribution 
in these populations The three language speakers, i.e., Indo- 
European (IE), Dravidian (DR) and Austro-Asiatics (AA), 
irrespective of tribes or castes, are seen to be influenced by 


903 


AA 


<= 
<x 


M124 R2 
GO M17 Riat 
@ M207 R* 
m M122 03 
B M95 02a 


B M20/M11L 
M172 J2 
M52 H1* 

m M89 F* 

m M130 C 


a Jae ce 
<x 
x a a a 


oa 
a 


<x <= 
<x <x 


Figure 74-1 Picture of 100% stacked area of NRY HG allelic composition of 58 Indian and 5 Pakistani popula- 
tions (totaling 2447 samples) arranged according to the language they speak at present. A clear trend of 
various alleles in different language speakers was discernible. Note the relative distributions of Rla1-M17 
(yellow color), O2a-M95 (violet), H1-M52 (green); L-M20 (pink) F*-M89 F* (red) and C-M130 (black) in 
various language speakers. P = Pakistan; IE = Indo-European speakers; TB = Tibeto-Burmese speakers; 
AA = Austro-Asiatic speakers; and DR = Dravidian speakers, all from India. For exact population caste/ 


tribe names and their references, refer to Table 74-1. Refer color figure. 


various eigenvectors: thus the Dravidian speakers, mostly 
tribals, are distributed in upper right quadrangle, while 
the Orissa, West Bengal, and northeastern tribal popula- 
tions speaking AA languages cluster on the right bottom 
quadrangle of the plot. Many Brahmin and other popula- 
tions of northern India, speaking IE languages, are clus- 
tered on the left bottom quadrangle of the plot. The overlap 
between IE speakers and DR speakers seen in the middle 
of the plot can be attributed to either a confluence of two 
ancestors, miscegenation, or founders to varying degrees, 


or to language replacement. The populations found at the 
extremes, with highest Eigen in one direction, possessed 
the highest frequencies of one or another NRY allele. This 
can be attributed to a small founder or bottleneck effect, 
and uninterrupted expansion without any foreign gene 
flow. The terrain and climate of the eastern central India 
and northeastern India favors such a population expan- 
sion. However, the absence or low frequencies of many 
other NRY clades in the AA speakers, and the concentra- 
tion of these tribal populations in huge numbers in the 


Language 

AA Andi 

@ DR 

OlE 

@ TB 

@ Unclassified 

Ae| 
And6 
@ 
4 eo” 
TN 
N 24 @ Anas 
2 @ And 
o ait @ 
° Guj3 WB1 TN4 
= Mah2 O © ws @ wes 
8 ™3 ~ Guj2O% Mais @ 
Oe S21 
. a dL 1 
fo} akS fe) a. 
; And11 
16) IN u@e e'D fo) 6 wea Pe @ ni 
| paksGyUprd ands @°2 
OBIH1@And7, Paks Sands Wes See 
Pak7  J&K2 © © > ashe preg! 
mp1 © r ands nae \ 
(ett "Q@~aris 
if in ori3 
Ee Jha5 
Pak6 ais 
=) 
et NEt 
Figure 74-2 The principal component analysis (PCA) of NRY/HG 
a data of Indian populations available in literature. Refer color 
Zt T T 4 T T T Le figure. 
= -3 -2 - 0) al 2 3 NOTE: The population Hazara from Pakistan was removed from analysis, 
because it showed very high variance, making differentiation among the rest 
Component dl of the populations difficult. 

904 GENOMICS IN MEDICINE AND HEALTH—INDIAN SUBCONTINENT 


GB M124 R2 
M17 Riat 
@ M207 R* 

mg M122 03 

B M95 02a 
mM20/M11L 
@ M172 J2 

m M52 H1* 

mw M89 F* 
mM130C 


100% 
80% 
60% 
40% 


20% 


[ale a 


<x 
<x 


o o a a oa ao 
<x <= 
< x (=) a a a (2) a 


Figure 74-1 Picture of 100% stacked area of NRY HG allelic composition of 58 Indian and 5 Pakistani popula- 
tions (totaling 2447 samples) arranged according to the language they speak at present. A clear trend of 
various alleles in different language speakers was discernible. Note the relative distributions of R1a1-M17 
(yellow color), O2a-M95 (violet), H1-M52 (green); L-M20 (pink) F*-M89 F* (red) and C-M130 (black) in var- 
ious language speakers. P = Pakistan; IE = Indo-European speakers; TB = Tibeto-Burmese speakers; AA 

= Austro-Asiatic speakers; and DR = Dravidian speakers, all from India. For exact population caste / tribe 
names and their references, refer to Table 74-1. 


vives And4 
@ DR 
@ IE 
@ 1B 
© Unclassified 
44 
And6 
oria 
TN2 
oe @ _ Anas 
. @ anc2 
S TNG 
s Guj3 @ wet N4 
E mah2O O ths @ was 
& TN3 cu29@ eo fe) 
ja 
oR 21 
Pak5 And9 
0+ md iD Wee Wee Pe Andi 
Pakage Uprd Ands. Se me 
BIH1@And7_ P8 
te ON @ e PATO) Q Jhagandgg!na7 
mp1 O ie) And4 ms) 2 
Oo WB1. Jhaé 
Oo Ori 
Jhi Ori3 
Shad gs ‘ 
2 Pak6 ie 
Suit NE 
e" . 
NE2 q Se 
@ Figure 74-2 The principal component analysis (PCA) of NRY/HG 
-4 3 2 4 0 1 > 3 data of Indian populations available in literature. 


NOTE: The population Hazara from Pakistan was removed from analysis, 
because it showed very high variance, making differentiation among the rest 


Component 1 of the populations difficult. 


3.0 


25 
wn 
o 
oO 
520 
g ° 
125 aun 
iSO 
ee 
ea ay : 
1.0 AiePasyie 
Becerra Nl 
t T T T T T T T T 1 
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10 


Figure 74-3 The Scree Plot of the principal component analysis of NRY HG data of Indian populations 


available in literature. 


eastern central India/Orissa belt and the northeastern hilly 
tracts of India suggests a concomitant origin of these NRY 
clades and their language, and a spurt of huge expansion 
from the small founder. This is reiterated by the observa- 
tion that all these populations possessed very little of other 
parallel and later-derived NRY HGs. 

This proposition is further supported by the data that 
the Chinese and other oriental populations with the deriv- 
atives of O3 have are decedents from a common ances- 
tor somewhere from the northeast/Myanmar region (Shi 
et al., 2005). Basu et al. (2003) have suggested that the AA 
and TB speaking tribal groups might have entered India 
first from a northwest corridor and, much later, some 
through a northeast corridor. Contrary to this, Cordaux 
et al. (2004b) have proposed that northeast India acted as 
a barrier. Kumar et al. (2007), in an extensive recent study, 
identified a strong genetic link among sublinguistic groups 
of Indian AA-speaking populations and has suggested an 
origin of AAs in India who later spread to Southeast Asia. 
The analysis of the present study, in light of the population 
size and the extent of distribution of these AA-speaking 
tribal groups, reiterates the concomitant origin of these 
clades and their language. The time of origin of the clade 
O2a, i.e., 11,700 years ago (Sengupta et al., 2006; Table 
74-1) fits well with the assumptions that spoken language 
originated ~10,000 years ago. A very interesting observa- 
tion was the higher frequencies (50%-90%) of O2a in half 
of the AA-speaking populations hitherto available in lit- 
erature (Table 74-2). 


CONCLUSION 


India, the second continent to be successfully occupied 
by modern man, is heterogeneous in itself, in terms of 


GENOMIC PERSPECTIVES OF PEOPLING AND LANGUAGES OF THE INDIAN SUBCONTINENT 


geography, climate, and populations. The whole of India 
thus cannot be considered as a single gene pool. The 
migratory history as revealed by NRY shows definitive 
pathways, origin, and autochthonous expansion of vari- 
ous NRY clades and populations in different parts of the 
country. Many of these populations are ancient than the 
languages they speak. Thus, as various languages devel- 
oped, presumably in small founders, the population 
expansion and language spread must have taken place 
concomitantly: hence, we see a good correlation between 
languages and NRY in India. 


SUMMARY 


Modern man (Homo sapiens sapiens), originating in 
Africa, first emigrated ~70,000 years ago, walked through 
the coasts of India (southern coastal route model) and 
reached Australia. Since then many migrations, settle- 
ments, and expansions have taken place in various parts 
of India. The island model of human settlements and 
expansions may explain the origin of settled communi- 
ties and languages in India.NRY chromosome markers 
help to unravel the details of early migrations of man into 
South Asia. The analysis of the literature thus suggests 
the origin and expansion of languages, superimposed 
by the genomic data: the data implies small founders, 
autochthonous origin (mutation) of new NRY markers, 
nuclear origins, and uninterrupted expansion / dispersal 
of populations and languages in India as exemplified by 
Austro-Asiatic (AA) speakers. The better communica- 
tion means and the language presumably led to the settle- 
ments (founders), rapid expansion, formation of culture 
and societies, and their dispersal to newer horizons and 
territories. South Asia and Southeast Asia thus seem to 


905 


be the cradle of many new founders, and autochthonous 
civilizations commensurate with languages that are pre- 
served till today. 


REFERENCES 


Aquadro CF, DuMont BB, Redd FA. (2001). Genome wide variation 
in human and fruitfly: a comparison. Curr Opinion Genet Dev 
11:627-634. 

Baltimore D. (2001). Our genome unveiled. Nature 409:814-816. 

Basu A, Mukherjee N, Roy S, et al. (2003). Ethnic India: A genomic 
view, with special reference to peopling and structure. Genome Res 
13:2277-2290. 

Bhasin MK. (2006). Genetics of castes and tribes of India: Indian pop- 
ulation milieu. Int J Hum Genet 6(3):233-274. 

Cann RL, Stoneking M, Wilson AC. (1987). Mitochondrial DNA and 
human evolution. Nature 325:31-36. 

Cann RL. (2001). Genetic clues to dispersal of human populations: 
Retracing the past from the present. Science 291:1742-1748. 

Capelli C, Wilson JF, Richards M, et al. (2001). A predominantly indig- 
enous paternal heritage for the Austronesian speaking peoples of 
insular South East Asia and Oceania. Am J Hum. Genet 68:432-443. 

Cavalli-Sforza LL, Menozzi P, Piazza A. (1994). The history and geog- 
raphy of human genes. Princeton, New Jersey: Princeton University 
Press, p. 1088. 

Cavalli-Sforza LL. (1997). Genes, peoples, and languages. Proc Natl 
Acad Sci US A 94:7719-7724. 

Cordaux R, Aunger R, Bentley G, Nasidze I, Sirajuddin SM, Stoneking 
M. (2004a). Independent origins of Indian caste and tribal paternal 
lineages. Curr Biol 14:231-235. 

Cordaux R, Weiss G, Saha N, Stoneking M. (2004b). The northeast 
Indian passageway: a barrier or corridor for human migrations. 
Mol Biol Evol 21:1525- 1533. 

Diamond J, Bellwood P. (2003). Farmers and their languages: The first 
expansions. Science 300:597-603. 

Eller E. (2001). Estimating relative population sizes from simulated 
data sets and the question of greater African effective size. Am J 
Phys Anthropol 116:1-12. 

Fuller D. (2003). An agricultural perspective on Dravidian historical 
linguistics: archaeological crop packages, livestock and Dravidian 
crop vocabulary. In Bellwood P, Renfrew C, eds. Examining The 
Farming/Language Dispersal Hypothesis. Cambridge: McDonald 
Institute for Archaeological Research, pp. 191-213. 

Giacomo Di F, Luca F, et al.(2004). Y chromosome haplogroup J as a 
signature of the post-neolithic colonization of Europe. Hum Genet 
115:357-371. 

Hammer MF, Karafet MT, Rasanayagam A, et al. (1998). Out of Africa 
and back again: nested cladistic analysis of human Y chromosome 
variation. Mol Biol Evol 15:427-441. 

Hammer ME, Redd AJ, Wood ET, et al. (2000). Jewish and Middle 
Eastern non-Jewish populations share a common pool of 
Y-chromosome biallelic haplotypes. Proc Natl Acad Sci US A 
97(12):6769-774. 

HUGO Pan-Asian SNP Consortium. (2009). Mapping Human Genetic 
Diversity in Asia. Science 326(5959):1541-1545. 

Karafet TM, Xu L, Du R, et al. (2001). Paternal population history of 
East Asia: Sources, patterns, and microevolutionary processes. Am 
J] Hum Genet 69:615-628 

Kavitha VJ. (2008). Studies on the Genomic Diversity of Southern Indian 
Breeding Isolates. PhD Thesis. Madurai Kamaraj University, India. 

Kayser M, Brauer S, Weiss G, et al. (2000). Melanesian origin of 
Polynesian Y chromosomes. Curr Biol 10:1237- 1246. 

Kivisild T, Rootsi S, Metspalu M, et al.(2003). The genetic signatures of 
earliest settlers persist in Indian tribal and caste populations. Am J 
Hum Genet 72:313-332. 


906 


Kumar V, Reddy AN, Babu JP, et al. (2007). Y-chromosome evidence 
suggests a common paternal heritage of Austro Asiatic popula- 
tions. BMC Evol Biol 28;7:47. 

Lell JT, Sukernik RI, Starikovskaya YB, et al. (2002). The duel origin 
and Siberian affinities of Native American Y chromosomes. Am J 
Hum Genet 70:192-198. 

Misra VN.(2001). Prehistoric colonisation of India. 
26:491-531. 

Nebel A, Filon D, Brinkmann B, Majumder PP, Faerman M, 
Oppenheim A. (2001). The Y chromosome pool of Jews as part 
of the genetic landscape of the Middle East. Am J Hum Genet 
69:1095-1112. 

Paddaya K. (1982). The Transition from Lower to Middle 
Paleolithic and the Origin of Modern Man. Ronen A, ed. British 
Archaeological Reports International series, Oxford, U.K.: vol 
151, pp. 257-264. 

Passarino G, Semino O, Magri C, et al. (2001). The 49 af haplo- 
type 11 is a new marker of the EU19 lineage that traces migra- 
tions from northern regions of the Black Sea. Hum Immunol 
62:922-932. 

Passarino G, Cavalleri GL, Cavalli-Sforza LL, Borresen-Dale A-L, 
Underhill PA. (2002). Different genetic components in the 
Norwegian population revealed by the analysis of mtDNA and Y 
chromosome polymorphisms. Eur J Hum Genet 10:521-529. 

Pauling L, Itano HA, Singer SJ and Wells IG. (1949). Sickle-cell ane- 
mia, a molecular disease. Science 110:543-548. 

Petraglia MD, Haslam M, Fuller DQ, Boivin N. (2010). The southern 
dispersal route and the spread of modern humans along the Indian 
Ocean rim: New hypotheses and evidence. Annals of Human 
Biology 37(3):288-311 

Qamar R, Ayub Q, Mohyuddin A, et al. (2002). Y chromosomal DNA 
variation in Pakistan. Am J Hum Genet 70:1107-1124. 

Quintana-Murci L, Krausz C, Zerjal T, et al.(2001).Y-chromosome 
lineages trace diffusion of people and languages in Southwestern 
Asia. Am J Hum Genet 68:537-542. 

Rosser ZH, Zerjal T, Hurles ME, et al.(2000). Y Chromosomal Diversity 
in Europe Is Clinal and Influenced Primarily by Geography, Rather 
than by Language. Am J Hum Genet 67:1526-1543. 

Sahoo S, Singh A, Himabindu G, et al. (2006). Prehistory of Indian 
Y chromosomes: Evaluating demic diffusion scenarios. Proc Natl 
Acad Sci U S A 103(4):843-848. 

Sanghvi LD, Balakrishnan V, Karve I. (1981). Biology of the people 
of Tamil Nadu. Pune: Indian Society of Human Genetics and 
Calcutta: Indian Anthropological Society. 

Seielstad M, YuldashevaN, SinghN, etal. (2003) A novel Y-chromosome 
variant puts an upper limit on the timing of first entry into the 
Americas. Am J Hum Genet 73(3):700-705. 

Semino O, Passarino G, Oefner PJ, et al.(2000). The genetic legacy of 
Paleolithic Homo sapiens sapiens in extant Europeans: a Y chromo- 
some perspective. Science 290:1155-1159. 

Semino O, Magri C, Benuzzi G, et al.(2004). Origin, Diffusion and 
Differentiation of Y-chromosomal Haplogroups E and J: Inferences 
on Neolithization of Europe and later migratory events in the 
Mediterranean Area. Am J] Hum Genet 74:1023-1034. 

Sengupta S, Zhivotovsky LA, King R, et al.(2006). Polarity and tempo- 
rality of highresolution Y chromosome distributions in India iden- 
tify both indigenous and exogenous expansions and reveal minor 
genetic influence of central Asian pastoralists. Am J Hum Genet 
78:202-221. 

Sharma S, Rai E, Sharma P, et al. (2009). The Indian origin of pater- 
nal haplogroup Rlal* substantiates the autochthonous origin of 
Brahmins and the caste system. J Hum Genet 54(1):47-55 

Shi H, Dong YL, Wen B, et al. (2005). Y-chromosome evidence of 
southern origin of the East Asian-specific haplogroup O3-M122. 
Am J Hum Genet 77(3):408-419. 

Su B, Xiao C, Deka R, Seielstad MT, et al. (2000). Y chromosome haplo- 
types reveal prehistorical migrations to the Himalayas. Hum Genet 
107:582-590 


J Biosci 


GENOMICS IN MEDICINE AND HEALTH—INDIAN SUBCONTINENT 


‘ZQP-9OPTL Jauay wing [ wy “eIsy [e1yUId OUT s}ySIsUT 
[ewosoworyo x :syuaaa yuasar Aq padeysas adeospur] Seue8 y 
(Z00Z) “OD YIWUS-A9]AT, “Y APPFEGIZNY ‘N PABYSEPINA Y STEM “L [e497 

“PUTYD ‘Teysurys ‘uo1ysiuvs1C amouay 
uvUNng ayy, ut payuasaid sadeg ‘suonendod ueiseang ur Ayssearp 
adAjouayd YIM JWA}sIsUOD ST SNIO[ YTOW 24} Ww wstydsiowAjog 
y8tH “(z00z) ‘Te 3° ‘We Aeryeqizma ‘Sa siPM ‘N eAcyseplnA 

‘L9p—-Son “dd “yy sstT 
DOK MAN ‘auapiag isso ayy fo Aaaung ppsoM, Y ‘suvWiNz Usapow 
fo sui811Q ayy, “spa “J 1a9UEedg ‘YI YW :U] “eISy Iseq Wor, euUap 
-IAd [Issoy ay} SULATOAUT UOFNTOAA prururoy jo A1oay} [e19Ues YW :suIs 
-110 suaidvg owopy UIIPOW *(F86T) “W eUoOIY], pue x NM “HIN Podjom, 
Ajatoos o1ydeis0ey [euoneN 
‘peloig s1ydei30uay ayy, episuy :Ansaouy daeq (£007) *S SIIaM 

“6VTOI-VVTOI:86 VSN 198 PYIV [HON 204d 
‘Ayisiaarp awosowosyp-A Jo aatjadsiad [eyuauTjUOD v :pueTIesYy 
ueiseing ayy, (1007) Te 39 “YU Aeryeqizny ‘N eAsyse[pNA “SY STM 

“FZ9-6S9:601 J2Uay winz ‘dno13 [euoneu-ouyje 

a8uTs & UTYIIM ainjons}s [eUOTBar 3uOI}s [eaves sadAjo[dey auosowr 
-O1Y) A URTUDIIY *(100Z) ‘Te 1? ‘Au 498e/ “| ueksodoysidax “AW [22M 

PRP-6LF(P)SI Jauay wny [ang ‘ety dnors0jdey 

UTY}IM sawosoWOIYD x URIS pur ueadoing Jo ArjsaouROD [eIDeIH 
-jsod ay} Buperedas *(Q10Z) ‘Te 19 ‘S 18}00N WN sed “Vd [FUFepuN 


‘e6r-Lep ‘dd TITAXT eunoA, 

‘ssa1g Aroyerogey sroqiey Suridg plop ‘Asojorg eateyUeNd 

uo eisodwiks soqiepy Butidg plop ‘saddjojdvyy awmosowosyo-X 
wmosf sanjy :Asojsipy uoungy Sutssafut “(€007) “Vd Iutepun 

Z9-€P:69 Jouay wnpy uuy ‘suoryepndod uewmy 

UJapour Jo suIsTI0 ayy pue sadAjojdey Areutq auosourorys X JO 
AydesZ0a80yAyd ayy, ‘(100Z) ‘Te 19 “VV UT ‘D Oulresseg “Vd [TH4ytepun, 

0¢-1:(Z-D)8 

jauay wing [ Jul ‘sewosowo1yD-{ UeIpUy Jo y}ays WYyde130a80] 

-kyd aatsuayaiduros y :suonendog uetpuy JO UIBII0 aUD04sI9]q 
jo syutiduiy onauey -(g00z)'Te 39 “WV YBUIS ‘S COYeS “Y IPeATIL 

“9@T-LITIS 

yauay wing [ ‘o8ejadryore IeqooIN ey} Jo uoNepndod Sururpep e 

uaduroys ay} JO SUTSIIO ay} OJUT s}YSIsuL Ie[NIz[OW *(900Z) “MA 
dedsey pure yq esis ‘y yBurs ‘{ aaftaaueg ‘], turysyepeys “Y IpeaTL, 

‘8€1-O8EL:IZZ 22ua19g ‘suIsII0 UeUNY 

usapou pue sndo] FAO ay} ye wINTAqyMbastp aseyur] Jo susoyjed 
[eqOTD *(966T) “Ul PPEX ‘LV susyed ‘M Peeds “q Ypszietq “WS HONYSLL 

“THL J2U2D OW “VNC [eUpuoyponu pur atuosouroryo X 

Woy aouaIezUT :etpuy Jo sdnosd [eqi1y pue sojseo JoMO] ay] SuoWE 
sarqruyge 219Uay *(900Z) ‘Te 39 *D AeqneyD ‘y feresuey], ‘| Weesuey], 

€6-98:€I Jolg 44nD ‘uonefndod ueumy Surystuea v ‘sJepueys] UeWepUY 
ay} Jo samrutye 2euUeay “(¢00Z) Te 39 ‘OV Appay “T yBuIg y fexeBueyy, 


